What's the time commitment for this bootcamp?

The bootcamp requires 10 hours per week over 6 weeks. This includes live sessions, hands-on projects, and self-paced learning. Most students find this manageable alongside their full-time jobs.

Do I need prior AI experience to join?

No prior AI experience is required, but you should have few years of software development experience. The bootcamp is designed for software engineers who want to upskill in AI engineering.

What if I can't make a live session?

All live sessions are recorded and available for replay. We also offer multiple office hours throughout the week, so you can catch up on any missed content or get help with assignments.

How much should I budget for APIs and resources?

We estimate €10-50 for the entire bootcamp, covering API costs for OpenAI and other services. We'll show you how to optimize costs and use free tiers when possible.

What happens if I can't attend this cohort?

You can defer to the next cohort at no additional cost. We run cohorts every 2-3 months, so you won't have to wait long to join.

How long will I have access to materials after the bootcamp?

You'll have lifetime access to all course materials, recordings, and the private community. This includes future updates and new content we add to the bootcamp.

What's the refund policy?

Yes, you can get a 100% refund if you've progressed less than 10% of the bootcamp or it's within 7 days of your purchase. We're confident in our curriculum and instructor quality, which is why we offer this guarantee.

Do you offer team discounts?

Yes! We offer 20%+ discounts for teams of 3 or more. Contact us at param@learnwithparam.com for team pricing and bulk enrollment options.

Voice Conversation Memory: Why Your Bot Forgets Who You Are

In a text chatbot, "memory" is easy. If the user scrolls up, they see the history. If the bot forgets something, the user can just re-read the previous messages.

In Voice AI, the rules change completely.

Context is Invisible: The user cannot "scroll up." If the bot forgets that my name is Bob, the illusion of intelligence shatters immediately.
Context is Latency: Every token you send to the LLM adds milliseconds to the response time. Sending a 10-minute transcript (approx. 1,500 words) to GPT-4o doesn't just cost money; it adds a 1-2 second processing delay.

In voice, Latency is the enemy.

This post explores how to manage conversation memory so your bot stays smart enough to remember you, but light enough to respond instantly.

The problem: The "Context Bloat" curve

Imagine a 10-minute customer support call.

Minute 1: History is short. Latency is 300ms. Snappy.
Minute 5: History is 2,000 tokens. Latency creeps to 800ms.
Minute 10: History is 4,000 tokens. Latency spikes to 1.5s. The user starts interrupting the bot because it feels slow.

We cannot simply append every User and Assistant message to the list forever. We need a strategy to prune the history while keeping the meaning.

Strategy 1: The sliding window (Short-term memory)

For a fluid voice conversation, the bot usually only needs the last 3-4 turns to understand immediate context (e.g., "Yes, that works" or "No, the other one").

We implement a Sliding Window manager that keeps the System Prompt fixed (the "Personality") but strictly trims the middle of the conversation.

graph LR
    subgraph RAW["Raw Conversation History"]
        A[System] --> B[Turn 1]
        B --> C[Turn 2]
        C --> D[Turn 3]
        D --> E[Turn 4]
        E --> F[Turn 5]
    end
    
    subgraph WINDOW["Sliding Window: Context sent to LLM"]
        A2[System] --> D2[Turn 3]
        D2 --> E2[Turn 4]
        E2 --> F2[Turn 5]
    end
    
    style B fill:#ffebee,stroke:#b71c1c
    style C fill:#ffebee,stroke:#b71c1c

The Implementation:

In LiveKit agents, the context is often managed automatically, but for production, you want explicit control.

# A simple manual pruner
def prune_context(chat_ctx):
    # Always keep the System Prompt (index 0)
    system_prompt = chat_ctx.messages[0]
    
    # Get the rest of the history
    history = chat_ctx.messages[1:]
    
    # Keep only the last 6 messages (3 turns)
    if len(history) > 6:
        history = history[-6:]
        
    return [system_prompt] + history

Pros: Zero latency overhead. Extremely cheap.

Cons: The "Goldfish Effect." If I said "My name is Bob" at minute 1, and the window slides past it, the bot forgets my name at minute 3.

Strategy 2: The "Sidecar" summarizer (Long-term persistence)

To solve the Goldfish Effect without bloating the main context, we use a Background Process.

While the main agent is chatting, a second, smaller LLM (the "Sidecar") runs in the background. It watches the conversation and updates a "Summary" section in the System Prompt.

graph TD
    A[Voice Conversation Stream] --> B(Main Agent Loop)
    A --> C(Background Sidecar Worker)
    
    C --> D[Extract Facts: User is Bob, Wants Pizza]
    D --> E[Update System Prompt]
    
    E --> B
    
    style C fill:#fff9c4,stroke:#fbc02d
    style E fill:#e3f2fd,stroke:#0d47a1

The Implementation:

We use an async task so we don't block the audio stream.

async def background_summarizer(full_history, agent):
    """
    Runs periodically to compress history into facts.
    """
    # We use a cheap, fast model (like gpt-4o-mini) for summarization
    summary = await cheap_llm.generate(
        f"Extract key facts from this conversation history: {full_history}"
    )
    
    # We inject these facts into the 'hidden' context of the main agent
    new_system_prompt = f"""
    You are a helpful assistant.
    
    CORE MEMORY (DO NOT FORGET):
    {summary}
    """
    
    # Update the running agent's prompt live
    agent.update_system_prompt(new_system_prompt)

Pros: Retains long-term context (names, preferences) without growing token count.

Cons: There is a delay. The summary might update 10 seconds after the user says the fact.

Strategy 3: Structured state extraction (The "Pro" move)

Summaries are fuzzy. "User wants pizza" is text.

For robust applications (like ordering food), we don't want text summaries; we want Structured Data.

Instead of summarizing, we give the agent a tool called update_order or save_profile. The agent "offloads" memory to a structured object.

User: "I want a pepperoni pizza."
Agent (Thought): User provided data. I will call update_order(item="pepperoni pizza").
System: Updates order_state = {"items": ["pepperoni pizza"]}.
System: Injects Current Order: 1x Pepperoni Pizza into the System Prompt.

This keeps the prompt tiny but the memory perfect.

Engineering trade-off matrix

Strategy	Latency Impact	Recall Quality	Token Cost	Best For
Full History	High (Bad)	Perfect	High	Short demos (< 2 mins)
Sliding Window	Low (Good)	Low (Forgets)	Low	Casual chat / Small talk
Async Summary	Low (Good)	Medium (Fuzzy)	Medium	Support bots / General Q&A
Structured State	Low (Good)	High (Precise)	Low	Transactional Bots (Ordering, Booking)

Challenge for you

Scenario: You are building a Medical Intake Voice Bot.

Requirement: The call might last 20 minutes. You must capture every symptom mentioned, even if it was said at minute 1. You cannot lose data.
Constraint: You cannot simply keep 20 minutes of text in the prompt (latency will be too high).

Your Task:

Why would Sliding Window fail here?
Why might Async Summarization be risky (think about "hallucinating" a symptom)?
Design a Structured State solution. What would your Pydantic schema look like for PatientData? How would you prompt the agent to save symptoms as they are spoken?

Key takeaways

Context is latency in voice: Every token adds milliseconds to response time, making memory management critical for sub-500ms latency
Sliding windows trade recall for speed: Keeping only recent turns enables fast responses but causes the "Goldfish Effect" where early context is lost
Async summarization preserves facts: Background processes can compress long conversations into key facts without blocking the main audio stream
Structured state is the most reliable: Using tools to extract and store data in structured formats (Pydantic models) provides precise memory without token bloat
Different strategies for different use cases: Casual chat needs speed (sliding window), support needs facts (summarization), transactional needs precision (structured state)
Memory tools enable precise extraction: Giving agents tools like update_order or save_profile lets them offload memory to structured objects
System prompts can hold compressed context: Injecting structured state summaries into system prompts keeps context small but accurate

For more on voice AI systems, see our voice AI fundamentals guide and our streaming guide.

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Voice Conversation Memory: Why Your Bot Forgets Who You Are

Share this post

The problem: The "Context Bloat" curve

Strategy 1: The sliding window (Short-term memory)

Strategy 2: The "Sidecar" summarizer (Long-term persistence)

Strategy 3: Structured state extraction (The "Pro" move)

Engineering trade-off matrix

Challenge for you

Key takeaways

Share this post

Continue Reading

Domain-Specific Voice Flows: Building the Guardrails

Multi-Agent Voice Systems: The Warm Transfer

Voice AI Fundamentals: The 500ms Threshold

Browser Automation: Building Agents That See and Click

Workflow Orchestration: Building State Machines with LangGraph

Voice Conversation Memory: Why Your Bot Forgets Who You Are

Share this post

The problem: The "Context Bloat" curve

Strategy 1: The sliding window (Short-term memory)

Strategy 2: The "Sidecar" summarizer (Long-term persistence)

Strategy 3: Structured state extraction (The "Pro" move)

Engineering trade-off matrix

Challenge for you

Key takeaways

Share this post

Continue Reading

Domain-Specific Voice Flows: Building the Guardrails

Multi-Agent Voice Systems: The Warm Transfer

Voice AI Fundamentals: The 500ms Threshold

Browser Automation: Building Agents That See and Click

Workflow Orchestration: Building State Machines with LangGraph

Weekly Bytes of AI