What's the time commitment for this bootcamp?

The bootcamp requires 10 hours per week over 6 weeks. This includes live sessions, hands-on projects, and self-paced learning. Most students find this manageable alongside their full-time jobs.

Do I need prior AI experience to join?

No prior AI experience is required, but you should have few years of software development experience. The bootcamp is designed for software engineers who want to upskill in AI engineering.

What if I can't make a live session?

All live sessions are recorded and available for replay. We also offer multiple office hours throughout the week, so you can catch up on any missed content or get help with assignments.

How much should I budget for APIs and resources?

We estimate €10-50 for the entire bootcamp, covering API costs for OpenAI and other services. We'll show you how to optimize costs and use free tiers when possible.

What happens if I can't attend this cohort?

You can defer to the next cohort at no additional cost. We run cohorts every 2-3 months, so you won't have to wait long to join.

How long will I have access to materials after the bootcamp?

You'll have lifetime access to all course materials, recordings, and the private community. This includes future updates and new content we add to the bootcamp.

What's the refund policy?

Yes, you can get a 100% refund if you've progressed less than 10% of the bootcamp or it's within 7 days of your purchase. We're confident in our curriculum and instructor quality, which is why we offer this guarantee.

Do you offer team discounts?

Yes! We offer 20%+ discounts for teams of 3 or more. Contact us at param@learnwithparam.com for team pricing and bulk enrollment options.

Retrieval-Augmented Generation (RAG): Giving LLMs an Open Book

In our last posts, we learned how to talk to LLMs (Prompt Engineering) and what they are (Token Predictors). But they all share a fundamental problem: they are like brilliant students taking a closed-book exam. For a deeper understanding of how LLMs work, see our engineering-focused LLM guide.

An LLM can only answer questions based on the knowledge it memorized during its training. This leads to two major problems:

Knowledge Cutoff: The model knows nothing about events that happened after its training. It can't tell you yesterday's news or stock prices.
Private Data: The model has no access to your company's internal documents, your personal notes, or your new product's technical specs.

What if we could give the LLM an "open-book exam" instead? This is the key insight of RAG.

Retrieval-Augmented Generation (RAG) is a technique that retrieves relevant information from your documents first, then uses an LLM to generate an answer based only on that information.

The problem: the closed-book exam

First, let's prove the problem. Imagine we have a private company memo. The LLM has never seen this text.

# Our private document
project_memo = """
Project Nova: Q3 2025 Internal Report
Prepared by: Dr. Evelyn Reed
Date: October 23, 2025

...The team successfully integrated the new chronosynclastic infundibulum,
resulting in a 40% increase in signal stability. The project's lead
engineer is named David Chen.
"""

# A question the LLM can't possibly know
query = "What was the main achievement in Project Nova, and who is the lead engineer?"

If we ask the LLM this question directly, it will fail.

# The "Closed-Book" prompt
prompt = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": query}
]

# The LLM's (failed) response would be:
# "I do not have access to internal reports like 'Project Nova'..."
# Or it might "hallucinate" (make up) an answer.

The model correctly states it doesn't know. Now, let's build the RAG pipeline to give it an "open book."

The RAG solution: an open-book exam

The RAG pipeline has a few simple steps. We'll turn our document into a searchable knowledge base and then use it to answer our query.

graph TD
    A[Your Document] --> B(Step 1: Chunk)
    B --> C(Step 2: Embed)
    C --> D[Step 3: Store in Vector DB]
    E[User Query] --> F(Step 2: Embed)
    F --> G(Step 4: Retrieve)
    D -- Fetches relevant chunks --> G
    G --> H(Step 5: Augment)
    E -- Original query --> H
    H --> I[Step 6: Generate]
    I --> J[Final Answer]

Step 1: Chunking the document

We can't just feed a huge document to the model. We need to break it into smaller, manageable chunks. For this example, we'll just split it by newlines.

# For a real app, you'd use a more advanced chunking strategy
# (e.g., by paragraph or a fixed token size)

chunks = [line for line in project_memo.split('\n') if line.strip() != ""]

# Our document is now a list of text strings:
# [
#   "Project Nova: Q3 2025 Internal Report",
#   "Prepared by: Dr. Evelyn Reed",
#   ...
# ]

Step 2: Creating embeddings

Next, we convert each text chunk into a list of numbers called a vector embedding. These vectors represent the semantic meaning of the text. Chunks with similar meanings will have mathematically similar vectors.

We use a special model (not a huge LLM) to do this.

from sentence_transformers import SentenceTransformer

# Load a model specifically for creating embeddings
embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

# Convert all our text chunks into numerical vectors
chunk_embeddings = embedding_model.encode(chunks)

# chunk_embeddings is now a list of vectors (arrays of numbers)
# e.g., [[0.1, 0.4, -0.2, ...], [0.8, 0.1, 0.9, ...], ...]

What are embeddings? (semantic vs. keyword search)

This is the magic of RAG. We've just enabled semantic search.

Keyword Search (Ctrl+F): You search for "delay." It only finds the exact word "delay."
Semantic Search (Vectors): You search for "delay". It finds chunks containing "revised timeline", "pushed back", or "holdup", because the meaning is similar.

graph TD
    subgraph KEYWORD["Keyword Search (Exact Match)"]
        A["Query: 'project delay'"] --> B{"Finds 'project delay'"}
        A -.-> C("Fails to find 'revised timeline'")
    end
    
    subgraph SEMANTIC["Semantic Search (Meaning Match)"]
        D["Query: 'project delay'"] --> E["Vector for 'delay'"]
        E -- "Is 'close to'" --> F["Vector for 'revised timeline'"]
        E -- "Is 'far from'" --> G["Vector for 'Dr. Evelyn Reed'"]
    end

Step 3: Creating a vector store (our "open book")

Now we need a place to store our embeddings and their corresponding text chunks. This is our searchable "library." We'll use a vector database like ChromaDB.

import chromadb
chroma_client = chromadb.Client()

# Create a "collection" (like a table) to hold our docs
collection = chroma_client.get_or_create_collection(name="project_nova_docs")

# We need unique IDs for each chunk
chunk_ids = [str(i) for i in range(len(chunks))]

# Add our embeddings and the original text to the database
collection.add(
    embeddings=chunk_embeddings,
    documents=chunks,
    ids=chunk_ids
)

Our knowledge base is now "indexed" and ready for questions.

Step 4: Retrieve, Augment, and Generate

This is the core RAG loop. We'll take our user's query, find the most relevant chunks, and then "augment" a new prompt for the LLM.

1. Retrieve

First, we embed the user's query using the same model and search the collection.

# The same query from before
query = "What was the main achievement in Project Nova, and who is the lead engineer?"

# 1. Embed the query
query_embedding = embedding_model.encode([query])

# 2. Search the collection for the top 3 most similar chunks
results = collection.query(
    query_embeddings=query_embedding,
    n_results=3
)

retrieved_chunks = results['documents'][0]
# retrieved_chunks might be:
# [
#   "...40% increase in signal stability.",
#   "...lead engineer is named David Chen.",
#   "Project Nova: Q3 2025 Internal Report"
# ]

2. Augment

Next, we create a new prompt that includes these retrieved chunks as context.

# We build a new prompt, stuffing the retrieved text into it
context = "\n".join(retrieved_chunks)

augmented_prompt = f"""
Use the following context to answer the question.
If the answer is not in the context, say 'I don't know'.

Context:
{context}

Question: {query}
"""

3. Generate

Finally, we send this new, context-rich prompt to the LLM.

# The "Open-Book" prompt
from openai import OpenAI
llm_client = OpenAI(api_key="...")

prompt = [
    {"role": "system", "content": "You are a precise and factual assistant."},
    {"role": "user", "content": augmented_prompt}
]

# The LLM's (successful) response will be:
# "The main achievement in Project Nova was a 40% increase in
# signal stability. The lead engineer is David Chen."

Success! The model isn't using its internal memory; it's reading the "open book" we just gave it.

Your mental model: RAG = Retriever + Generator

Think of RAG as a two-part system:

The Retriever (The Librarian): Its only job is to be an expert at searching your knowledge base. It takes a query and finds the most relevant documents. This part is fast and uses vector search.
The Generator (The Synthesizer): This is the LLM. Its job is to take the user's query and the documents from the Retriever and synthesize them into a single, human-readable answer.

graph TD
    A["User Query"] --> B["Retriever (Librarian)"]
    C["Vector Database"] -- "Chunks" --> B
    B -- "Relevant Chunks" --> D["Generator (LLM)"]
    A --> D
    D --> E["Final Answer"]

RAG is great for:

Answering questions over private documents (like your company's Wiki)
Providing up-to-date information (by adding new documents to the vector store)
Reducing hallucinations by "grounding" the LLM in specific facts
Providing citations for its answers (since you know which chunks it used)

Key takeaways

RAG solves the knowledge problem: It gives LLMs an "open-book" exam, letting them use external, up-to-date, or private information
Everything is vectors: RAG works by converting text (documents and queries) into numerical embeddings and finding the ones that are mathematically closest in meaning
The pipeline is key: A successful RAG system depends on good chunking, accurate embeddings, and a well-crafted prompt
Retrieval first, generation second: The core idea is to separate the problem of finding information from the problem of explaining it
Grounding reduces hallucinations: By forcing the LLM to base its answer on provided text, we significantly reduce its tendency to make things up

For more on building production AI systems, check out our AI Engineering Bootcamp.

Retrieval-Augmented Generation (RAG): Giving LLMs an Open Book

Share this post

The problem: the closed-book exam

The RAG solution: an open-book exam

Step 1: Chunking the document

Step 2: Creating embeddings

What are embeddings? (semantic vs. keyword search)

Step 3: Creating a vector store (our "open book")

Step 4: Retrieve, Augment, and Generate

Your mental model: RAG = Retriever + Generator

Key takeaways

Share this post

Continue Reading

Domain-Specific Voice Flows: Building the Guardrails

Multi-Agent Voice Systems: The Warm Transfer

Voice Conversation Memory: Why Your Bot Forgets Who You Are

Voice AI Fundamentals: The 500ms Threshold

Browser Automation: Building Agents That See and Click

Retrieval-Augmented Generation (RAG): Giving LLMs an Open Book

Share this post

Share this post

Continue Reading

Domain-Specific Voice Flows: Building the Guardrails

Multi-Agent Voice Systems: The Warm Transfer

Voice Conversation Memory: Why Your Bot Forgets Who You Are

Voice AI Fundamentals: The 500ms Threshold

Browser Automation: Building Agents That See and Click

Weekly Bytes of AI