Amber Architecture

Opening

“So what you’re looking at is Amber’s architecture — version 3.4, which is what’s running in production today. The headline is that it’s a single-agent, tools-first design, and I’ll explain what that means and why we made those choices as we go through it.”

The Frontend

“Starting on the left — members access Amber through a standard web chat interface. Nothing fancy there. The two things worth calling out are that responses stream in real time, so you’re not staring at a spinner waiting for an answer, and everything goes through Auth0 authentication, so it’s member-only access from the first click.”

The FastAPI Server

“Behind that is our FastAPI server — this is the Python backend that handles all the plumbing. Sessions, feedback collection, analytics, orchestrating the AI. It runs inside Azure Container Apps, which gives us the scaling and deployment infrastructure we need.”

The AI Agent

“Now this is the interesting part. At the center is the AI agent — GPT-5.2 running on Azure OpenAI. And the key thing to understand here is that there’s one model doing everything. It reads the member’s question, decides what information it needs, goes and gets it, and writes the answer. All in one pass.

We didn’t always do it this way. Earlier versions had multiple models — one for routing, one for answering. The problem was handoffs. More models meant more failure points and more complexity to maintain. GPT-5.2 is capable enough that we don’t need that anymore.”

The Eight Tools

“To actually answer questions, the agent has eight specialized tools it can call. Things like spec search, TCRL lookup, ICS lookup, terminology, qualification processes. And the agent decides — on its own — which tools to use for any given question. Sometimes it’s just one. Sometimes it calls three or four simultaneously.

That’s what we mean by tools-first. There’s no hard-coded routing logic saying ‘if the question contains this word, go here.’ The model figures it out. And that’s what makes it flexible enough to handle the full range of questions members actually ask.”

Why RAG and the Data Pipeline

“One thing worth explaining is why we use retrieval-augmented generation — RAG. The short answer is that we don’t trust GPT’s training data for Bluetooth specs. Specs change. Training data gets stale. So instead of asking the model what it remembers, Amber retrieves the actual current specification documents and grounds its answers in those. Every answer cites the specific document and section it came from.

The way we keep that knowledge base current is through a data pipeline. Specs are automatically downloaded from Bluetooth SIG sources, parsed and split into searchable chunks, embedded, and indexed in Azure AI Search. When a member asks a question, the relevant chunks are retrieved in milliseconds. And when new specs are published, we can refresh the index to keep everything current.”

The Azure Backend

“On the right side is the Azure infrastructure backing all of this — OpenAI for inference, AI Search indexing 60-plus spec documents, Cosmos DB storing sessions and feedback, and Blob Storage serving the spec images and diagrams that appear inline with answers.”

Quality and How We Measure It

“One question that always comes up is — how do we know it’s actually good? We have a formal evaluation framework where we test Amber against over 200 real member questions. We run blind comparisons against vanilla GPT-5.2, and right now Amber wins 72% of those comparisons. That’s the headline number.

Beyond that, members can rate answers directly in the interface, and we analyze that feedback to improve the system over time. There’s also confidence detection built in — when the agent isn’t sure about something, it flags it rather than presenting a guess as fact.”

Security and Privacy

“On the security side, a few things are worth calling out — especially for any compliance or IT stakeholders in the room. Access is Auth0 SSO, so only authenticated Bluetooth SIG members get in. All data stays within Bluetooth SIG’s Azure tenant — nothing leaves to third parties. And each user’s conversation history is fully isolated, so there’s no cross-contamination between sessions.”

Monitoring

“We also have full observability into how the system is performing. Every query, response time, and feedback event is logged in Cosmos DB. We track the most common questions, peak usage times, and monitor for errors like rate limit hits. So we have a clear picture of how members are using it and where we need to improve.”

Closing

“So the simplest way to summarize it: a member asks a question, Amber authenticates them, the agent figures out what it needs to know, pulls from authoritative Bluetooth SIG sources, and returns a grounded, cited answer — all in a few seconds. We’re measuring it, we’re improving it, and the knowledge base stays current as specs evolve. That’s the system.”