Amber Lunch and Learn

SLIDE 1 — Title

Thanks for coming. I’ll get things moving — there’s a lot to cover, and I want to leave plenty of time for questions.

Today I’m going to walk you through Project Amber from top to bottom. What it is, the problem it solves, what we’ve built so far, how it’s structured, and where we’re taking it.

For those of you who’ve heard about Project Amber and been curious — this is your full picture. For those who’ve been involved in some way already, hopefully this gives you context on the pieces you haven’t seen.

A couple of things before I dive in. First, everything I’m going to show you is real. This isn’t a prototype we spun up for the slide deck. It’s running on production-grade Azure infrastructure with live SIG data behind it. Second, I want to be honest about where it is in its maturity. We’re very proud of where this has landed, and I’ll also tell you candidly what the path to full member launch still requires.

SLIDE 2 — What is Project Amber?

At its core, Amber is the Bluetooth SIG’s own AI assistant — purpose-built for the Bluetooth ecosystem.

The key phrase is purpose-built. This is not ChatGPT with a Bluetooth instruction bolted onto it. It’s a system that has been trained — or more precisely, grounded — in SIG-specific data: our own specifications, our qualification database, our TCRL test documentation, our ICS statements. It knows the difference between Core Spec 6.0 and 6.1 because we’ve explicitly built that versioning into the index. It understands that when a member asks about a “proximity sensor,” they probably mean the Proximity Profile, PXP. That kind of domain knowledge doesn’t come out of a generic AI — it has to be built.

The four capability areas on this slide are what that translates to in practice.

Specification Intelligence. 200-plus Bluetooth specifications, indexed. 37,000 pages broken into over 41,000 retrievable chunks. Semantic search — meaning it understands intent, not just keywords — with citations back to the exact source document and section.

Product Database. Live access to 70,000-plus qualified products through the Bluetooth SIG Qualified Products API. Search by company, model, feature. Compare manufacturers. The data is current as of the API — no stale snapshots.

Qualification Guidance. 5,415 TCRL test cases and over 11,600 ICS features, all queryable in plain English. A developer can ask “what tests do I need for a BLE heart rate monitor” and get a structured, accurate answer.

Natural Language Interface. You talk to it the way you’d talk to a knowledgeable colleague. Amber understands Bluetooth terminology, maps your intent to the right data sources, synthesizes an answer, and shows you its work.

And the name “Amber” is not without a related story to our Nordic-heritage naming. In fact, Amber is named for the Baltic resin that once connected Nordic trade routes. And, much like that, Amber connects members to the answers they need – drawing on the full body of Bluetooth specifications, qualification requirements, and product data to give clear, cited responses in plain English.

SLIDE 3 — The Challenge Amber Solves

Let me make the problem concrete, because I think most of us in this room have felt it personally.

Volume Overload. We have 200-plus specifications. 37,000 pages. If you know exactly which spec to open and which section to search, you can find an answer — eventually. But if you don’t already know the answer, finding the right starting point is itself a significant challenge. Members who are new to a profile or trying to understand a requirement for the first time are at a real disadvantage. And frankly, even people who have been working in Bluetooth for years spend meaningful time just navigating the documentation landscape.

Complex Processes. Qualification is the best example here. To qualify a product, a member needs to navigate TCRL test plans, select the right test cases, complete ICS statements, understand lab requirements, manage fees, and track timelines — all documented separately across different tools and documents. There’s no single authoritative guide that walks you through it end-to-end. Amber is designed to be that guide.

Expertise Silos. Deep Bluetooth knowledge is unevenly distributed. There are people on our staff and in our member community who can answer almost any question instantly — but there aren’t enough of them, and they’re in high demand. Every time a member has to open a support ticket because they couldn’t find the answer themselves, that’s a bottleneck we created. Amber is designed to give members the self-service capability to find their own answers, so the experts’ time can be spent on work that actually requires expert judgment.

Amber is the answer to all three. One interface, always available, that knows every specification, every test requirement, every qualified product, and can explain it clearly to anyone regardless of their experience level.

SLIDE 4 — Amber in the SIG Ecosystem

One thing I want to emphasize before we go further: Amber is additive. It does not replace anything we already have.

Qualification Workspace, PTS, the qualified products database, the Zendesk knowledge base, the specs listing on bluetooth.com — all of that continues to exist exactly as it does today. What Amber adds is an AI intelligence layer on top of all of it. A single interface that knows how to reach into all of those sources and synthesize an answer.

Think of it this way. Right now, if a member has a question, they might check bluetooth.com, search the knowledge base, look at Qualification Workspace, and then send a support ticket when they still can’t find the answer. With Amber, they ask once and get a single synthesized response with citations. The underlying sources are still there — Amber just removes the burden of knowing which one to consult.

One other thing this means: it’s available 24/7. A member in Asia or Europe at two in the morning doesn’t have to wait for business hours in Kirkland to get an answer to a qualification question.

SLIDE 5 — The Chapter Framework

We’re building Amber in chapters. Each chapter represents a coherent, independently valuable set of capabilities mapped to real member needs — not just engineering milestones.

Chapter 1 is live now. Specification and Product Intelligence — everything I’ve just described.

Chapter 2 is where we’re heading next. Test Authoring and Validation — bringing Amber into the spec and test development process itself. The team is still defining this, and I’ll spend a few minutes on it shortly because it’s genuinely different from Chapter 1 and worth understanding on its own terms.

Chapter 3 is on the roadmap. Qualification Workflow — embedding Amber natively inside Qualification Workspace so members are guided through qualification in plain language. This is tied to the Qualification Workspace product roadmap and Jay’s team’s work, so it’s more directional than the first two chapters.

The chapter structure is intentional. It lets us make real commitments about what is in scope now versus later, deliver value incrementally, and use learnings from each chapter to inform the next. We are not promising everything at once.

SLIDE 6 — HOW PROJECT AMBER IS BUILT

Before I get into the chapter details, I want to spend a few minutes on how this project runs — because understanding the structure helps explain how we’ve been able to move at this pace.

Amber is a small-team initiative built with a lean but disciplined approach. We work in a standard software development model: changes go through GitHub as pull requests, get reviewed before they’re merged, and are deployed to Azure in version-tagged increments so we always know exactly what’s running and can roll back instantly if needed. Every deployment gets tested against the live system before it’s declared stable. It’s not a large team, but it runs with real process behind it.

What’s different from a traditional SIG project is the scale and tooling. Rather than a large cross-functional team and a formal sprint cadence, we have a focused group moving quickly — with AI tooling doing a lot of the heavy lifting that would otherwise require more people. Code review assistance, documentation, architectural analysis, stakeholder communications — AI accelerates all of that. But the underlying practices are solid: version control, peer review, secrets management in Azure Key Vault, Auth0 SSO, proper CORS and session handling. We are building this to be handed to a production team, not rebuilt from scratch.

The evaluation methodology is where the rigor really shows. Every change gets validated against a 250-question test harness with SME-verified ground truth answers. When we say the system is at 82% win rate, that’s a reproducible number — same questions, same scoring methodology, blind evaluation. Decisions about what to build next are driven by that data, not by gut feel.

On the Microsoft side: after Bangkok, Microsoft development services become the production build partner. The SIG retains full product ownership — the roadmap, the requirements, and the domain knowledge we’ve built up stay with us. Microsoft leads the infrastructure work to take the system to production scale on managed Azure services. It’s a genuine collaboration, and the groundwork we’ve laid in the POC is what makes that handoff clean.

SLIDE 7 — Chapter 1 in Detail

Let me give you more detail on what Chapter 1 actually delivers and how we know it’s working.

Specification Search — 200-plus specs indexed, semantic retrieval, cited answers with links back to the source section. The semantic search piece is important: you can ask “how does channel sounding work” and it will find the right content even if you didn’t use the exact terminology from the spec. It also handles multi-turn — you can follow up with “what does that mean for direction finding” and it maintains context.

Product Intelligence — 70,000-plus qualified products, live API, always current. You can compare manufacturers, look up a specific model, or ask “how many Bluetooth products does Apple have qualified” and get a precise answer from real data, not a language model’s best guess.

Qualification Guidance — 5,415 TCRL test cases and 11,677 ICS features. You can ask which test cases apply to a given device type, what a specific ICS row means, or how to interpret a conformance requirement — and get clear guidance.

Now, the headline number: 82% win rate against ChatGPT on 250 domain-specific questions. I want to give you the full context on that number, because the methodology matters.

We built a test harness of 217 questions — the kind of questions our members actually ask. Categories including spec content, product lookup, qualification process, ICS interpretation, TCRL guidance, security, audio, and more. We ran both systems and scored the results blind. 82% Amber, 18% ChatGPT or tie.

The honest version of that story: our initial evaluation showed a 25% win rate. Alex Y, our AI consultant, ran a thorough analysis and identified that the majority of those losses were either missing documents from our index — specs that weren’t yet loaded — or over-aggressive routing logic that was refusing legitimate questions before they even got to the AI. Those were both fixable problems. After targeted fixes, we went from 25% to 72% in the first pass, and continued to 82% as we expanded the dataset and resolved remaining gaps. The improvement arc is actually part of what makes me confident about this system — we know exactly why it wins and why it loses, and the losses are progressively addressable.

I also want to be clear about what we care about internally versus what we use for external messaging. The 82% win rate is a useful headline for the board and for positioning. What really matters for member launch readiness is accuracy against SME-verified ground truth answers. That work is underway now — Daniel’s team is converting SME notes from the test harness into standardized ground-truth answers, and our target is 250 fully labeled queries before the Bangkok board meeting on March 17. That gives us statistical confidence at ±6% margin of error, which is what we need to make a defensible production go/no-go decision.

SLIDE 8 — Chapter 2: Test Authoring & Validation

Chapter 2 is where we extend Amber from answering questions to participating in the work itself.

Let me give you the full vision, because Daniel and I have spent real time on this and it’s worth explaining properly.

Right now, when someone is writing a test case or a specification section, there is no real-time feedback mechanism. You write it. It goes through a committee review cycle. Issues get surfaced at that review — late in the process, when rework is most expensive. The question Chapter 2 is asking is: what if Amber was alongside the author the whole time, not just at the end?

Concretely: an author is writing a test case. Amber is validating it in real time against the relevant specifications, the current ICS tables, any published errata, and the existing test suite. If there’s a conflict — the test case references a requirement that was modified in a later spec version, or the procedure contradicts an existing test — Amber flags it immediately. The author fixes it before it ever reaches committee review.

Daniel described this as bringing expertise to the moment of authoring, not the moment of review. That’s exactly the right framing.

And here’s the part that I find particularly elegant: the same agents that give you feedback while you’re writing can also run automatically when something is committed to the repository. Think of it as a CI pipeline for specs. In software development, when you push code, automated checks run — linting, tests, dependency validation. We want the same thing for specification and test authoring. Commit a draft spec section, and the agents check it for consistency, completeness, and conflicts before a human reviewer even looks at it.

The three pillars on the slide — Authoring Support, Conflict Detection, Review and Quality Agents — represent that arc: real-time feedback during writing, automated conflict checking, and intelligent review augmentation.

One thing I want to flag for anyone who’s been following the AsciiDoc migration discussion: Chapter 2 does not require AsciiDoc to begin. Our current data — Paligo exports, HTML specs, the XML corpus — is structured well enough to work with right now. The AsciiDoc transition and Amber Chapter 2 are being coordinated from a rollout and messaging perspective, but the technical dependency isn’t there. We’ve been deliberate about that because I didn’t want Chapter 2 to be held hostage to a migration timeline.

SLIDE 9 — Chapter 3: Qualification Workflow

Chapter 3 is the furthest out, but I want you to understand what it is, because it’s where this becomes transformational for member experience.

The vision: a member opens Qualification Workspace, and Amber is embedded right there. They don’t need to know anything about the qualification process before they start. Amber asks them about their device, walks them through test selection, helps them complete the ICS form with live spec context available, explains the lab requirements, and guides them through submission — all in plain English.

For a first-time qualifier, this removes the biggest barrier to qualification: not knowing where to start. For experienced members, it still saves meaningful time on the process steps that are tedious even when you know what you’re doing. And for us, it scales our ability to support members through qualification without adding headcount to handle every edge-case question.

I’ll be transparent: Chapter 3 is directional. We have a clear vision for it but the specific capabilities and timeline will be shaped by Chapter 1 and 2 learnings, member feedback, and what Jay’s QW team is building. The QW rewrite is a significant undertaking, and Chapter 3 Amber integrates with it. We’re not going to nail down the details until we’re closer and have more signal from the earlier chapters.

SLIDE 10 — Amber & Project Blue

This slide is important for context on how these two initiatives fit together, because I know some of you are following both.

Project Blue is the initiative to modernize the SIG’s toolchain infrastructure. Test-Documents-as-Code, AsciiDoc and Git for specification authoring, bi-directional Qualification Workspace-PTS sync, IOP results modernization, structured and machine-readable test data, telemetry and observability across the toolchain.

Project Amber is the AI intelligence layer on top of that infrastructure. Blue produces structured, machine-readable artifacts. Amber makes those artifacts conversationally accessible to anyone — members, authors, staff.

They are not the same project. They have separate teams, separate scope, separate delivery timelines. But they are genuinely complementary. The more structured and machine-readable our underlying data becomes — which is what Blue delivers — the more reliably Amber can access, interpret, and cite it. Blue feeds Amber.

The way I think about it: Blue is the plumbing, and Amber is the tap. Better plumbing means cleaner water. But you don’t have to wait for perfect plumbing to turn on the tap — which is why Amber Chapter 1 is live today, working with the data we currently have.

SLIDE 11 — Roadmap & Milestones

Let me walk you through how we got here and where we’re going, because the trajectory is important.

Q3 2025 — POC. This started as a proof of concept: a RAG prototype built on Bluetooth specifications, using ChromaDB for vector storage and Claude as the AI layer. It was exploratory — can you actually build something useful here? The answer was yes.

Q4 2025 — Azure Deployment. We moved off the initial infrastructure and onto Azure Container Apps. Authentication via Auth0 SSO. Enterprise-grade security. This is where it stopped being an experiment and started being a system.

Early 2026 — Production Architecture. Azure AI Search replaced ChromaDB. Azure Key Vault for secrets management. “Red-us” for session storage. We migrated from Claude to Azure OpenAI — that was a deliberate decision driven by organizational compliance requirements around external AI services, not a technical preference. The system architecture is now GPT-4o-mini for routing and GPT-5 for response synthesis.

Now, March 2026 — Bangkok Board Demo. This is a significant milestone. The board has been watching this project, and Bangkok is where we present Chapter 1 at near-production quality. The standard we set for ourselves was 95%-plus completion against the original PRD — not demo theater, but something we could put in front of members with confidence. We are close to that mark, with SME validation underway and the final evaluation running before March 3rd.

June 2026 — Milan F2F — Member Launch. Chapter 1 goes live to the full membership. Simultaneously, Chapter 2 scoping begins in earnest.

The total investment from POC to member launch is roughly 290 to 320 person-hours of direct development and coordination work. That’s an unusually lean investment for a system of this scope — which reflects both the nature of the work and the AI-assisted development approach I described earlier.

SLIDE 12 — What’s Next For You

Three things I want from you today.

First — try it. Pilot access is open to SIG staff now. I want you to ask Amber a real question — not a test question, not a generic one. Something you’ve genuinely had to go find an answer to in the last month. See what it returns. See whether the citations are right. Tell me where it’s wrong, because every gap you find improves the system before it reaches members.

Second — share use cases. The most valuable input you can give us right now is real examples of where you spend time hunting for answers, or where you see members struggling. That input directly shapes Chapter 2. We’re not designing Chapter 2 in a vacuum — we’re designing it around actual observed pain points. If you work in spec development or test authoring, I particularly want to hear from you. Chapter 2 is for you.

Third — spread the word. Tell colleagues who should know about this. The more SIG staff who are using Amber during the pilot, the better the feedback signal before member launch. And if you’re in working groups or have relationships with power users in the member community who would make good early adopters, flag them to me or to Daniel Cowling. We’re building that list.

I’ll leave it there. What questions do you have?