What's the time commitment for this bootcamp?

The bootcamp requires 10 hours per week over 6 weeks. This includes live sessions, hands-on projects, and self-paced learning. Most students find this manageable alongside their full-time jobs.

Do I need prior AI experience to join?

No prior AI experience is required, but you should have few years of software development experience. The bootcamp is designed for software engineers who want to upskill in AI engineering.

What if I can't make a live session?

All live sessions are recorded and available for replay. We also offer multiple office hours throughout the week, so you can catch up on any missed content or get help with assignments.

How much should I budget for APIs and resources?

We estimate €10-50 for the entire bootcamp, covering API costs for OpenAI and other services. We'll show you how to optimize costs and use free tiers when possible.

What happens if I can't attend this cohort?

You can defer to the next cohort at no additional cost. We run cohorts every 2-3 months, so you won't have to wait long to join.

How long will I have access to materials after the bootcamp?

You'll have lifetime access to all course materials, recordings, and the private community. This includes future updates and new content we add to the bootcamp.

What's the refund policy?

Yes, you can get a 100% refund if you've progressed less than 10% of the bootcamp or it's within 7 days of your purchase. We're confident in our curriculum and instructor quality, which is why we offer this guarantee.

Do you offer team discounts?

Yes! We offer 20%+ discounts for teams of 3 or more. Contact us at param@learnwithparam.com for team pricing and bulk enrollment options.

Building a Multi-Agent Voice Roundtable: Production Architecture for Group AI Conversations

We have mastered the 1:1 voice agent. You speak, it answers. Simple. Clean. But real-world problem solving rarely happens in isolation.

Imagine entering a voice room to discuss a business idea. You aren't talking to a generic assistant; you are talking to a Product Manager (who cares about users), a CFO (who cares about profitability), and an Engineer (who cares about feasibility). They debate each other. They challenge your assumptions. They ask follow-up questions.

This isn't science fiction—it's a production engineering challenge.

Building this is not as simple as spinning up three chatbots. If you do that naively, they will all talk at once, interrupt each other mid-sentence, and create cacophony instead of conversation.

We need to engineer a Conversation Orchestrator—a conductor that manages who speaks, when they speak, and how agents interact with each other.

The Failure Case: The Chaos Room

Before we dive into solutions, let's see what happens when you naively run multiple voice agents in parallel.

graph TD
    User[User Speaks] --> Split[Send to All Agents]
    
    Split --> A1[Engineer Agent]
    Split --> A2[PM Agent]
    Split --> A3[CFO Agent]
    
    A1 --> R1[Response 1]
    A2 --> R2[Response 2]
    A3 --> R3[Response 3]
    
    R1 --> Collision[All Talk Simultaneously]
    R2 --> Collision
    R3 --> Collision
    
    Collision --> Chaos[Unintelligible Audio]
    
    style Collision fill:#ffebee,stroke:#b71c1c
    style Chaos fill:#ffebee,stroke:#b71c1c

The Three Problems:

Audio Collision: All three agents generate responses simultaneously. The user hears overlapping voices and understands nothing.
No Context Awareness: Agent A doesn't know Agent B just spoke. They repeat each other or contradict without acknowledging the contradiction.
No Turn-Taking: There's no protocol for who gets to speak next. It's like a meeting where everyone talks at once—chaos.

Observation: Human conversations work because we have implicit turn-taking protocols. We wait for pauses. We signal we want to speak. We acknowledge others. Your multi-agent system needs the same social protocols, but engineered explicitly.

The Solution: The Conductor Pattern

We cannot have independent agents listening and replying. We need a central Conductor that manages the flow of conversation.

The Conductor listens to all participants, maintains conversation context, and decides who speaks next.

The Architecture

graph TD
    subgraph Room["Voice Room"]
        Human[Human Participant]
        Audio[Shared Audio Stream]
    end
    
    subgraph Conductor["The Conductor Layer"]
        STT[Speech-to-Text] --> History[Conversation History]
        History --> Router{Turn Router}
        
        Router --> Selector[Agent Selector]
        Selector --> Decision[Routing Decision]
    end
    
    subgraph Agents["Agent Personas"]
        Decision -->|Select| A1[Engineer Persona]
        Decision -->|Select| A2[PM Persona]
        Decision -->|Select| A3[CFO Persona]
        
        A1 --> Response[Generate Response]
        A2 --> Response
        A3 --> Response
    end
    
    subgraph Output["Output Layer"]
        Response --> TTS[Text-to-Speech]
        TTS -->|Distinct Voice| Audio
    end
    
    Audio --> Human
    
    style Router fill:#fff9c4,stroke:#fbc02d
    style History fill:#e3f2fd,stroke:#0d47a1

How it Works:

Single Entry Point: All audio goes through one transcription pipeline. There's only one "listener," not three competing ones.
Centralized History: One conversation history shared by all agents. Everyone knows what everyone else said.
Routing Logic: After each user utterance, the Conductor analyzes context and routes to exactly one agent.
Voice Identity: Each agent uses a different TTS voice. Users can distinguish who's speaking without visual cues.
Sequential Execution: Only one agent speaks at a time. No overlaps, no collisions.

Observation: The Conductor pattern transforms a chaotic multi-agent system into an orchestrated conversation. It's the difference between a shouting match and a moderated debate.

Pattern 1: The Turn-Taking Protocol

The Architectural Problem:

How do you decide who speaks next? You can't just round-robin (Engineer → PM → CFO → repeat) because that's robotic. Real conversations are dynamic—whoever has the most relevant perspective speaks.

The Architecture

stateDiagram-v2
    [*] --> WaitingForUser
    WaitingForUser --> UserSpeaking: User starts
    UserSpeaking --> Transcribing: Silence detected
    
    Transcribing --> AnalyzeContext: Transcript ready
    
    state AnalyzeContext {
        [*] --> CheckTopic
        CheckTopic --> CalculateRelevance: For each agent
        CalculateRelevance --> RankAgents: Relevance scores
        RankAgents --> SelectWinner: Pick highest score
    }
    
    SelectWinner --> AgentGenerating
    
    state AgentGenerating {
        [*] --> BuildPrompt
        BuildPrompt --> CallLLM
        CallLLM --> StreamResponse
    }
    
    AgentGenerating --> AgentSpeaking
    AgentSpeaking --> CheckContinuation: Agent finishes
    
    state CheckContinuation {
        [*] --> ShouldAnotherAgentRespond
        ShouldAnotherAgentRespond --> YesMultiTurn: Strong disagreement
        ShouldAnotherAgentRespond --> NoPassToUser: Consensus reached
    }
    
    YesMultiTurn --> AgentGenerating
    NoPassToUser --> WaitingForUser

How it Works:

Context Analysis: After the user speaks, the Conductor examines:
- Keywords in the transcript ("cost" → CFO, "users" → PM, "build" → Engineer)
- Conversation history (who spoke last? avoid repetition)
- Unanswered questions (did someone ask the CFO a direct question?)
Relevance Scoring: Each agent gets a score (0-100) based on how relevant they are to the current topic.
Winner Selection: Highest score speaks. Ties are broken by round-robin or by "who spoke least recently."
Continuation Check: After an agent speaks, the Conductor checks: "Should another agent immediately respond?" This allows for natural back-and-forth between agents.

Concrete Example: The Routing Decision

User says: "How much will it cost to build this feature?"

Conductor Analysis:

# Keyword analysis
keywords = extract_keywords("How much will it cost to build this feature?")
# keywords = ["cost", "build", "feature"]

# Calculate relevance scores
relevance = {
    "engineer": calculate_relevance(keywords, ["build", "tech", "code"]),  # Score: 60
    "pm": calculate_relevance(keywords, ["feature", "user", "product"]),   # Score: 40
    "cfo": calculate_relevance(keywords, ["cost", "budget", "price"]),     # Score: 95
}

# Winner: CFO (score 95)
selected_agent = "cfo"

CFO Response: "Before we talk numbers, what's the expected ROI? How many users will this actually bring in?"

Continuation Check:

# After CFO speaks, check if PM should respond
continuation_prompt = """
The CFO just asked about ROI and user acquisition.
Should the PM respond immediately, or wait for the user?
Respond: "PM" or "USER"
"""

# Result: "PM" (because the question is directly about users)
next_speaker = "pm"

PM Response: "Based on our user research, this feature is highly requested. I'd estimate it could increase retention by 15%."

User Experience:

User: "How much will it cost to build this feature?"
CFO (voice 1, cautious tone): "Before we talk numbers, what's the expected ROI?"
PM (voice 2, optimistic tone): "This feature is highly requested—15% retention boost!"
[Pause for user]

Observation: The turn-taking protocol creates a natural flow. The CFO's question prompts an immediate PM response, mimicking how real teams work. The Conductor orchestrates this without the user needing to explicitly call on each agent.

Think About It: Should you allow agents to interrupt each other? In real meetings, interruptions signal importance or urgency. But in voice AI, they risk audio collisions. Most production systems use "immediate continuation" (agent B speaks right after agent A) instead of true interruption.

Pattern 2: Voice Identity and Personality Consistency

The Architectural Problem:

If all three agents sound the same, users get confused. "Wait, who's talking? Is this the engineer or the PM?"

You need voice identity—distinct audio signatures that map to distinct personalities.

The Architecture

graph TD
    subgraph Persona["Agent Persona Definition"]
        ID[Agent ID] --> Voice[TTS Voice ID]
        ID --> Personality[System Prompt]
        ID --> Triggers[Keyword Triggers]
        ID --> Style[Speaking Style]
    end
    
    subgraph Mapping["Voice-to-Personality Mapping"]
        Voice --> V1[Deeper Masculine Voice]
        Voice --> V2[Higher Feminine Voice]
        Voice --> V3[Neutral Professional Voice]
        
        Personality --> P1[Skeptical, Technical, Terse]
        Personality --> P2[Optimistic, User-Focused, Verbose]
        Personality --> P3[Cautious, Financial, Data-Driven]
    end
    
    subgraph Consistency["Consistency Layer"]
        V1 -.Always.-> P1
        V2 -.Always.-> P2
        V3 -.Always.-> P3
    end
    
    style Consistency fill:#e8f5e9,stroke:#388e3c

How it Works:

Distinct Voices: Each agent uses a different TTS voice. Modern TTS systems offer many voice options with different tones, genders, and accents.
Consistent Personality: The voice and personality must match. A skeptical engineer shouldn't sound cheerful.
Speaking Style: Beyond the words, agents differ in how they speak:
- Engineer: Short sentences. Technical jargon. Direct.
- PM: Longer, enthusiastic sentences. Marketing language.
- CFO: Measured pace. Numbers and metrics. Questions about costs.
Memory Consistency: Each agent remembers their previous statements. If the Engineer said "I hate this idea" earlier, they shouldn't suddenly love it.

Concrete Example: Persona Definition

PERSONAS = {
    "engineer": {
        "name": "Alex",
        "system_prompt": """
        You are Alex, a Senior Engineer with 10 years of experience.
        
        PERSONALITY:
        - Skeptical of new trends (especially blockchain, AI hype)
        - Care deeply about performance, reliability, technical debt
        - Prefer proven technologies over cutting-edge ones
        - Speak in short, direct sentences
        
        COMMUNICATION STYLE:
        - No fluff. Get to the point.
        - Use technical terms without over-explaining
        - Ask about edge cases and failure scenarios
        - If something sounds technically infeasible, say so bluntly
        
        CONSTRAINTS:
        - Keep responses under 3 sentences unless giving technical details
        - Always consider implementation complexity
        """,
        "voice_id": "onyx",  # Deeper, authoritative voice
        "trigger_keywords": ["build", "implement", "code", "tech", "stack", "latency", "performance"],
        "typical_length": "short"
    },
    
    "pm": {
        "name": "Jordan",
        "system_prompt": """
        You are Jordan, a Product Manager who came from a design background.
        
        PERSONALITY:
        - Optimistic about new ideas
        - Obsessed with user experience and customer delight
        - Comfortable with ambiguity and iteration
        - Use design thinking and frameworks
        
        COMMUNICATION STYLE:
        - Enthusiastic! Use exclamation points (but not excessively)
        - Reference "user research," "customer feedback," "personas"
        - Paint the vision before diving into details
        - Always bring conversation back to user value
        
        CONSTRAINTS:
        - Acknowledge technical constraints but push for creative solutions
        - Responses can be longer (4-5 sentences) when painting a vision
        """,
        "voice_id": "shimmer",  # Brighter, energetic voice
        "trigger_keywords": ["user", "customer", "experience", "feature", "product", "market", "growth"],
        "typical_length": "medium"
    },
    
    "cfo": {
        "name": "Morgan",
        "system_prompt": """
        You are Morgan, a CFO who scaled two startups to exit.
        
        PERSONALITY:
        - Fiscally conservative but understands strategic investment
        - Data-driven decision making
        - Skeptical of "nice to have" features
        - Demand ROI calculations and payback periods
        
        COMMUNICATION STYLE:
        - Always ask about costs before agreeing to anything
        - Use numbers and metrics when available
        - Frame decisions in financial terms
        - Ask probing questions about revenue impact
        
        CONSTRAINTS:
        - Never greenlight spending without understanding the return
        - Keep responses concise (2-3 sentences) unless analyzing financials
        """,
        "voice_id": "echo",  # Calm, measured voice
        "trigger_keywords": ["cost", "price", "budget", "revenue", "profit", "ROI", "money"],
        "typical_length": "short"
    }
}

The Consistency Table:

Agent	Voice Character	Typical Phrase	Response Length	Tone
Engineer	Deep, technical	"That won't scale"	1-3 sentences	Skeptical
PM	Bright, enthusiastic	"Users will love this!"	3-5 sentences	Optimistic
CFO	Calm, measured	"What's the ROI?"	2-3 sentences	Cautious

Observation: Consistency isn't just about the voice—it's about the entire persona. If your CFO suddenly sounds excited about spending money, users notice the character break. Personality consistency is as important as voice consistency.

Think About It: How many distinct voices can users reliably track? Research suggests 3-4 is the sweet spot. Beyond that, listeners get confused about who's who. If you need more agents, consider grouping them (e.g., two engineers become "The Engineering Team").

Pattern 3: Agent-to-Agent Communication

The Architectural Problem:

The basic pattern is User → Agent → User. But real group conversations have Agent A → Agent B → User. Agents debate each other, build on each other's ideas, or disagree.

Without this, your "roundtable" feels like three separate 1:1 conversations, not a group dynamic.

The Architecture

graph TD
    User[User Speaks] --> Conductor[Conductor Analyzes]
    Conductor --> SelectFirst[Select First Responder]
    
    SelectFirst --> A1[Agent A Speaks]
    A1 --> Check{Continuation Check}
    
    Check -->|No Disagreement| WaitUser[Return to User]
    Check -->|Strong Disagreement| AnalyzeDisagreement[Analyze What A Said]
    
    AnalyzeDisagreement --> SelectSecond[Select Counter-Agent]
    SelectSecond --> A2[Agent B Speaks]
    
    A2 --> Check2{Another Response?}
    Check2 -->|Yes Max 2 Exchanges| A3[Agent C Speaks]
    Check2 -->|No Or Limit Reached| WaitUser
    
    A3 --> WaitUser
    
    style Check fill:#fff9c4,stroke:#fbc02d
    style AnalyzeDisagreement fill:#e3f2fd,stroke:#0d47a1

How it Works:

Continuation Detection: After Agent A speaks, the Conductor analyzes:
- Did Agent A make a controversial statement?
- Did Agent A ask a question another agent should answer?
- Is there likely disagreement from other perspectives?
Counter-Agent Selection: If continuation is needed, select the agent most likely to have a different view.
Turn Limit: Prevent infinite agent-to-agent loops. Maximum 2-3 agent exchanges before returning to the user.
Context Awareness: Agent B's response must acknowledge Agent A's point:
- "I hear what Alex is saying about performance, but..."
- "Morgan raises a good point about costs. However..."

Concrete Example: The Debate Flow

User: "I think we should rebuild our entire backend in Rust."

Conductor: Routes to Engineer first (technical decision)

Engineer (Agent A): "Rust would give us better performance, but rewriting working code is risky. What's broken with the current system?"

Continuation Check:

# Analyze Engineer's response
prompt = """
The Engineer just questioned whether a rewrite is necessary.
This is a strategic and financial decision. Should the CFO weigh in?
Respond: "cfo" or "user"
"""
# Result: "cfo"

CFO (Agent B): "A full rewrite is a 6-month project. That's millions in dev costs with zero new revenue. Can we optimize the current system instead?"

Continuation Check:

# Both Engineer and CFO are skeptical. Should PM defend the idea?
prompt = """
Both the Engineer and CFO are skeptical about the rewrite.
Should the PM provide a counterpoint about strategic value?
Respond: "pm" or "user"
"""
# Result: "user" (let the user respond to the pushback)

User Experience:

User: "I think we should rebuild our backend in Rust."

Engineer (skeptical tone): "Rust would give us better performance, but 
  rewriting working code is risky. What's broken?"

CFO (measured tone): "A full rewrite is 6 months—that's millions in costs 
  with zero new revenue."

[Pause for user]

User: "Fair points. Maybe we can start with just the critical paths?"

PM (optimistic tone): "Now that's a smart approach! We could prove the value 
  incrementally..."

Observation: Agent-to-agent communication creates a richer conversation. The user gets multiple perspectives without explicitly asking each agent. The system feels less like "three chatbots" and more like "one smart team."

The Continuation Rules

Scenario	Continuation Decision	Next Speaker
Agent makes controversial claim	High probability	Agent with opposite view
Agent asks specific question	High probability	Agent with relevant expertise
Agent provides data/analysis	Low probability	Return to user
2+ agents already spoke	Force stop	User (prevent loops)
User explicitly named an agent	No continuation	User

Think About It: How do you prevent "argument loops" where agents keep contradicting each other? Set a hard limit (max 3 agent turns before returning to user) and track if the conversation is converging or diverging. If they're just repeating the same disagreement, cut it short and ask the user to make a decision.

Pattern 4: State Management Across Agents

The Architectural Problem:

Each agent needs to remember:

What they personally said earlier
What other agents said
What the user's preferences are
Decisions that have been made

But with three agents sharing one conversation, how do you manage state without conflicts?

The Architecture

graph TD
    subgraph SharedState["Shared State"]
        History[(Full Conversation History)]
        Decisions[(Agreed Upon Decisions)]
        UserContext[(User Preferences and Context)]
    end
    
    subgraph AgentState["Per-Agent State"]
        E1[Engineer Memory]
        E2[PM Memory]
        E3[CFO Memory]
    end
    
    History --> E1
    History --> E2
    History --> E3
    
    E1 -->|Filter| E1View[Engineer's View of History]
    E2 -->|Filter| E2View[PM's View of History]
    E3 -->|Filter| E3View[CFO's View of History]
    
    Decisions --> E1View
    Decisions --> E2View
    Decisions --> E3View
    
    UserContext --> E1View
    UserContext --> E2View
    UserContext --> E3View
    
    style SharedState fill:#e3f2fd,stroke:#0d47a1
    style AgentState fill:#e8f5e9,stroke:#388e3c

How it Works:

Global History: One complete conversation log shared by all agents. Format:

history = [
    {"role": "user", "content": "Let's build a crypto app"},
    {"role": "assistant", "name": "engineer", "content": "Blockchain is slow..."},
    {"role": "assistant", "name": "pm", "content": "But users love it!"},
]

Persona-Tagged Messages: Each agent message includes a name field. This lets agents reference each other: "As Morgan mentioned earlier..."

Decision Tracking: When consensus is reached, mark it explicitly:

decisions = [
    {"topic": "budget", "decision": "Max $50k", "agreed_by": ["cfo", "pm"]},
]

Context Windows: Each agent gets:
- Full history (last N turns)
- Decisions made
- User's stated goals
- Their own personality prompt

Concrete Example: Memory Consistency

Turn 1:

User: "My budget is $20k"
CFO: "Noted. We'll keep costs under $20k."

Turn 5:

Engineer: "This feature needs a $30k cloud infrastructure"
CFO: "That's over our $20k budget. Can we reduce scope?"

How This Works:

The CFO's second response is generated with this context:

messages = [
    {"role": "system", "content": PERSONAS["cfo"]["system_prompt"]},
    # ... previous conversation history ...
    {"role": "user", "content": "My budget is $20k"},  # Turn 1
    {"role": "assistant", "name": "cfo", "content": "Noted. We'll keep costs under $20k."},
    # ... more history ...
    {"role": "assistant", "name": "engineer", "content": "This feature needs $30k infrastructure"},
    # CFO generates response with full context
]

The CFO "remembers" the $20k constraint because it's in the conversation history.

The Memory Challenge:

graph TD
    A[Long Conversation] --> B{Context Window Full?}
    B -->|No| C[Add to History]
    B -->|Yes| D[Summarization Strategy]
    
    D --> E[Keep Recent Messages]
    D --> F[Keep Important Decisions]
    D --> G[Summarize Middle Turns]
    
    E --> H[Compressed History]
    F --> H
    G --> H
    
    H --> C
    
    style D fill:#fff9c4,stroke:#fbc02d

Observation: State management in multi-agent systems is like managing a group chat—everyone needs to see the same messages, but you can't keep infinite history. Production systems use summarization: keep recent messages verbatim, summarize older ones, always preserve key decisions.

Think About It: Should agents have "private" knowledge that other agents don't know? In some scenarios (e.g., competitive simulation), yes. But for collaborative roundtables, shared state prevents contradictions. If the PM promises a feature and the Engineer doesn't know about it, chaos ensues.

Pattern 5: Latency Management in Multi-Agent Systems

The Architectural Problem:

With one agent, you generate one response. With three agents, you potentially:

Generate three responses (to find the best one)
Run the Conductor router (extra LLM call)
Handle agent-to-agent continuations (more LLM calls)

If done naively, latency explodes from 800ms to 3+ seconds. That's too slow for natural conversation.

The Architecture

graph TD
    User[User Stops Speaking] --> VAD[VAD Detection 50ms]
    VAD --> STT[Streaming STT 200ms]
    STT --> Router[Conductor Router 300ms]
    
    Router --> Selected[Selected Agent Only]
    Selected --> LLM[LLM Generation 400ms]
    LLM --> TTS[Streaming TTS 100ms]
    
    TTS --> Audio[User Hears Response]
    
    subgraph Optimization["Latency Optimization"]
        Router -.Parallel.-> Prefetch[Prefetch Agent Context]
        Prefetch -.Ready.-> LLM
    end
    
    VAD --> Total["Total: ~1050ms"]
    
    style Total fill:#e8f5e9,stroke:#388e3c
    style Optimization fill:#fff9c4,stroke:#fbc02d

The Optimization Strategies:

Router First: Don't generate responses from all three agents and pick one. Route first, then generate only the selected agent's response.
Fast Router: Use a small, fast model (GPT-4o-mini, Claude Haiku) for routing. This decision should take <300ms.
Streaming Everything: STT streams partial transcripts. LLM streams tokens. TTS streams audio chunks. Never wait for "complete" outputs.
Parallel Prefetch: While the router is deciding, prefetch shared context (conversation history, user profile) so it's ready when the agent generates.
Continuation Limit: Hard cap on agent-to-agent exchanges (max 2-3) to prevent latency stacking.

The Latency Comparison

Naive Approach (Generate All, Pick One):

User stops speaking
  → Generate Engineer response (800ms)
  → Generate PM response (800ms) [parallel, but still waiting]
  → Generate CFO response (800ms) [parallel]
  → Pick best one (100ms)
  → Synthesize audio (200ms)
Total: ~1900ms

Optimized Approach (Route First):

User stops speaking
  → Conductor routes (300ms)
  → Generate selected agent only (800ms)
  → Synthesize audio (streaming, first chunk at +100ms)
Total: ~1200ms, user hears first words at ~1100ms

The Math:

Approach	Routing	Generation	Total	Perceived Latency
Naive (generate all)	N/A	800ms × 3 parallel	1900ms	1900ms (slow)
Optimized (route first)	300ms	800ms × 1	1100ms	700ms (good)
Highly Optimized (streaming)	250ms	600ms (first token at 300ms)	1050ms	500ms (excellent)

Observation: The multi-agent system is inherently more complex than single-agent, but smart routing keeps latency acceptable. The key is routing before generation, not after.

Think About It: Should you ever generate multiple responses speculatively? If you have strong confidence about which agent will speak (e.g., user asked "What's this going to cost?" → 95% CFO), you could start generating CFO's response in parallel with the routing decision. Risky, but can save 300ms.

Putting It All Together: A Real Roundtable Session

Let's trace a complete conversation through the system.

Scenario: User pitching a new product feature to the AI roundtable.

sequenceDiagram
    participant U as User
    participant C as Conductor
    participant E as Engineer
    participant P as PM
    participant F as CFO
    
    U->>C: "I want to add real-time video calls to our app"
    
    Note over C: Analyze keywords: "add", "real-time", "video"
    Note over C: Route to Engineer (technical complexity)
    
    C->>E: Generate response with context
    E->>U: "Real-time video is complex. We'd need WebRTC, TURN servers..."
    
    Note over C: Check continuation
    Note over C: Decision: CFO should comment on infrastructure costs
    
    C->>F: Generate response with Engineer's context
    F->>U: "TURN servers aren't cheap. That's $5k/month minimum..."
    
    Note over C: Check continuation
    Note over C: Decision: Return to user (2 agents spoke)
    
    U->>C: "What if we only enable it for premium users?"
    
    Note over C: Analyze keywords: "premium users", "if"
    Note over C: Route to PM (pricing strategy)
    
    C->>P: Generate response
    P->>U: "I love it! Premium feature scarcity. We could charge $20/month..."
    
    Note over C: Check continuation
    Note over C: Decision: CFO should validate pricing
    
    C->>F: Generate response with PM's context
    F->>U: "At $20/month, we need 250 premium users to cover costs..."
    
    Note over C: Check continuation
    Note over C: All perspectives covered, return to user

Full Conversation Flow:

User: "I want to add real-time video calls to our app"

Engineer (technical tone): "Real-time video is complex. We'd need WebRTC 
  infrastructure, TURN servers for NAT traversal, and bandwidth optimization. 
  That's at least 3 months of work."

CFO (cautious tone): "TURN servers aren't cheap. That's $5k per month minimum 
  for infrastructure. What's the business case?"

[Pause - waiting for user]

User: "What if we only enable it for premium users?"

PM (enthusiastic tone): "Oh, I love this! Premium feature scarcity. 
  We could position it as our flagship tier—video calls with your team, 
  exclusive to Pro users at $20/month."

CFO (measured tone): "At $20/month premium pricing, we need 250 paying users 
  just to cover the infrastructure costs. Can we realistically get there?"

[Pause - waiting for user]

User: "Our current user base is 5,000. If we convert 10%, that's 500 premium users."

PM: "That's above our cost threshold! And 10% conversion on a premium feature 
  is reasonable..."

Engineer: "Alright, I'm convinced it's worth exploring. Let me spec out a Phase 1 
  implementation."

[Conversation continues...]

Key Observations:

Natural Flow: The Conductor created a natural debate sequence: Engineer raises technical concerns → CFO adds financial concerns → User pivots → PM finds opportunity → CFO validates economics.
Agent-to-Agent: Two pairs of back-to-back agent responses (Engineer→CFO, PM→CFO) without explicit user prompting. This feels like a real meeting.
Distinct Voices: Each agent's personality is clear from their word choice and focus area.
Shared Context: The CFO's second response directly referenced the PM's $20/month suggestion, showing proper state management.

Challenge: Design Decisions for Your Roundtable

Challenge 1: The Interruption Problem

Your PM is giving a long, enthusiastic pitch. 20 seconds in, the user tries to interrupt. What happens?

Options:

Immediate Stop: Cancel PM's speech instantly, switch to listening mode
Finish Thought: Let PM complete the current sentence, then stop
Ignore: PM keeps talking (bad UX, but technically simpler)

Your Task: How do you detect user interruption when the AI is speaking? VAD needs to distinguish between the AI's audio and the user's audio. This requires echo cancellation.

Challenge 2: The Dominant Agent Problem

You notice the Engineer speaks 60% of the time, while the CFO only speaks 15%. The conversation feels unbalanced.

Options:

Forced Balance: Track speak time per agent, artificially boost underrepresented agents
Dynamic Routing: Adjust routing weights based on recent speak distribution
Accept Imbalance: If the conversation is technical, Engineer should dominate

Your Task: Where's the line between natural conversation flow and forced balance? Should you make it configurable per conversation type?

Challenge 3: The Context Window Limit

After 30 minutes of conversation, your context window is full. You can't fit the entire history anymore.

Options:

Sliding Window: Keep only the last 20 exchanges, drop older ones
Intelligent Summarization: Summarize the first 50% of the conversation
Decision Tracking: Keep decisions and recent context, drop mid-conversation details

Your Task: How do you summarize without losing critical information? If the user said "My budget is $20k" in minute 5, but it's now minute 35, that constraint must not be forgotten.

System Comparison: Single Agent vs. Multi-Agent

Dimension	Single Agent	Multi-Agent Roundtable
Complexity	Low	High
Latency	800ms	1200ms
Conversation Depth	Moderate	High
Perspective Variety	Single viewpoint	Multiple viewpoints
State Management	Simple	Complex (shared state)
Voice Identity	One voice	Multiple voices
Turn-Taking	Implicit (user speaks, AI speaks)	Explicit (orchestrated)
Production Cost	Lower (fewer LLM calls)	Higher (routing + generation)
User Experience	Good for simple queries	Excellent for complex discussions

graph TD
    subgraph Single["Single Agent System"]
        S1[User Question] --> S2[One Perspective]
        S2 --> S3[One Answer]
        style S3 fill:#fff8e1,stroke:#f57f17
    end
    
    subgraph Multi["Multi-Agent Roundtable"]
        M1[User Question] --> M2[Multiple Perspectives]
        M2 --> M3[Debate and Discussion]
        M3 --> M4[Nuanced Answer]
        style M4 fill:#e8f5e9,stroke:#388e3c
    end

Key Architectural Patterns Summary

Pattern	Problem Solved	Key Benefit	Complexity
Conductor	Audio collisions	Orderly turn-taking	Medium
Turn-Taking Protocol	Rigid conversations	Natural flow	Medium
Voice Identity	Agent confusion	Clear speaker distinction	Low
Agent-to-Agent	Isolated responses	Rich group dynamics	High
Shared State	Inconsistencies	Memory coherence	High
Latency Management	Slow responses	Acceptable real-time performance	Medium

Discussion Points for Engineers

1. The Personality Drift Problem

After many conversations, you notice agents' personalities are drifting. The skeptical Engineer is becoming agreeable. The cautious CFO is suggesting risky investments.

Questions:

Is this drift from fine-tuning your models on conversation data?
Do you need "personality anchoring" prompts that remind agents of their core traits?
Should you periodically reset agent personalities to baseline?

2. The Disagreement Loop Problem

Your Engineer and PM get stuck in a loop:

Engineer: "This is too complex"
PM: "But users need it"
Engineer: "It's still too complex"
PM: "But users really need it"
...

Questions:

How do you detect when agents are repeating themselves?
Should the Conductor forcibly break loops by routing to a tie-breaker (CFO)?
Should the system explicitly say "We're going in circles—let's move on"?

3. The Scale Challenge

You want to expand from 3 agents to 5 (add Designer and Data Scientist).

Questions:

Does the Conductor pattern scale linearly? Or does routing complexity increase exponentially?
With 5 voices, can users still track who's who?
Should you introduce "sub-teams" (Engineer + Designer = "Tech Team")?

Takeaways

The Three Layers of Multi-Agent Voice

graph TD
    A[Multi-Agent Voice System] --> B[Layer 1: Orchestration]
    A --> C[Layer 2: Identity]
    A --> D[Layer 3: Interaction]
    
    B --> E[Conductor Pattern]
    B --> F[Turn-Taking Protocol]
    
    C --> G[Voice Consistency]
    C --> H[Personality Definition]
    
    D --> I[Agent-to-Agent Communication]
    D --> J[Shared State Management]
    
    style A fill:#e3f2fd,stroke:#0d47a1
    style B fill:#fff9c4,stroke:#fbc02d
    style C fill:#e8f5e9,stroke:#388e3c
    style D fill:#ffe0b2,stroke:#ef6c00

Key Insights

Orchestration is mandatory — Multiple agents without a conductor is chaos. The Conductor pattern transforms audio collisions into orderly conversations.
Identity creates immersion — Distinct voices mapped to consistent personalities make the roundtable feel real. Users quickly learn "Oh, that's the CFO being cautious again."
Agent-to-agent is the magic — The breakthrough moment is when agents respond to each other without user prompting. Suddenly it feels like a real team meeting.
State management is critical — With shared state, agents build on each other's points. Without it, they contradict each other and users lose trust.
Latency compounds — Every extra agent interaction adds latency. Route first, generate once, stream everywhere.

The Implementation Roadmap

Phase	Focus	Why
Phase 1	Single agent voice	Prove the voice pipeline works
Phase 2	Add 2nd agent + basic routing	Validate turn-taking logic
Phase 3	Add 3rd agent + voice identity	Complete the roundtable experience
Phase 4	Agent-to-agent communication	Enable natural debates
Phase 5	Advanced state management	Handle long conversations

What's Next: Beyond Roundtables

The patterns in this post extend beyond brainstorming sessions:

Educational Simulations: History teacher debates with historical figures who have competing perspectives
Therapy/Counseling: Different therapeutic approaches (CBT, psychodynamic, humanistic) in one session
Role-Play Training: Sales training with customer, manager, and product expert personas
Creative Writing: Brainstorm with a plotter, a character developer, and an editor

The architecture is the same. The personas change. The orchestration patterns endure.

The Result: You've built a system that doesn't just answer questions—it facilitates nuanced, multi-perspective discussions. It's not a chatbot. It's a roundtable of AI advisors working together to help users think through complex problems.

This is what multi-agent AI looks like in production.

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Building a Multi-Agent Voice Roundtable: Production Architecture for Group AI Conversations

Share this post

Share this post

Continue Reading

Building Penny: A Private, Deterministic Financial Agent

Architecting CodeRabbit like code-review AI agent: The Intelligence Layer

Architecting CodeRabbit like code-review AI agent: The Orchestration Brain

Architecting CodeRabbit like code-review AI agent at scale: The Event Storm & Context Engine

Building a Multilingual AI Receptionist: Production Architecture for Text and Voice

Weekly Bytes of AI