Advanced RAG: Building Self-Correcting Systems
In our last post, we learned about the standard RAG pipeline. It's great, but it's like a diligent intern following a rigid checklist:
- Retrieve documents
- Stuff them into a prompt
- Generate an answer
This works perfectly for simple questions. But what happens when the checklist isn't enough?
You ask the intern, "How does our new product compare to our main competitor's?" The intern checks our internal documents, finds nothing about the competitor, and replies, "I don't have that information."
A senior employee wouldn't stop there. They would recognize the gap, decide to look elsewhere (like a public website), find the missing information, and then synthesize a complete answer.
This is the key insight of Advanced RAG. We must move from a fixed checklist to a dynamic, decision-making process. We'll build a system that can route tasks, grade its own work, and correct its mistakes, just like an expert.
The failure case: the rigid checklist
First, let's prove the problem. We'll create a "vector store" (our company's internal library) with information about our fictional product, "Model-V", but nothing about its competitors.
# 1. Create our internal-only document collection
documents = [
"The Model-V is our latest innovation in AI, featuring a 5-trillion parameter architecture.",
"Model-V's training data includes a proprietary dataset of scientific research papers.",
"Built on a unique 'Quantum Entanglement' processing core, the Model-V achieves new speeds."
]
# (Code to add these docs to a ChromaDB collection named "product_docs")
Now, let's ask our "simple RAG" intern a comparative question:
query = "How does the Model-V compare to the new Model-Z from our competitor?"
# 1. RETRIEVE: Search our collection
retrieved_docs = collection.query(query_texts=[query], n_results=2)
# This finds docs about Model-V, but nothing about Model-Z.
# 2. AUGMENT & GENERATE: Stuff docs into a prompt
context = "..." # (context about Model-V)
basic_rag_prompt = f"Context: {context}\n\nQuestion: {query}"
# The LLM's (failed) response:
# "The Model-V has a 5-trillion parameter architecture and a
# 'Quantum Entanglement' core. I do not have any information
# on the Model-Z from a competitor."
The simple RAG system fails. It correctly states it doesn't have the answer. This is where an "agentic" approach becomes necessary.
The solution: an Agentic Graph
Instead of a simple, linear checklist, we'll build a graph. A graph is a set of "nodes" (steps) and "edges" (decisions) that connect them. This allows our system to make choices, loop back, and correct itself.
Our agent's logic will look like this:
graph TD
A[Start] --> B(Route Query)
B -- "Internal Question" --> C[Retrieve from Vector Store]
B -- "External Question" --> D[Search the Web]
C --> E(Grade Documents)
E -- "Good Docs" --> F[Generate Answer]
E -- "Bad Docs" --> D
D --> F
F --> G[End]
Let's look at the key "nodes" or "brain cells" of our new agent.
1. The Router (the triage specialist)
The first step is a "Router" node. This is a small LLM call that only decides where to look first.
# The Router's logic
def route_query(question):
prompt = f"""
You are an expert at routing a user question.
Use 'vectorstore' for specific questions about Model-V's features.
Use 'web_search' for all other questions, especially comparisons.
Question: {question}
Where should I look?
"""
# LLM call will return "vectorstore" or "web_search"
return call_llm(prompt)
2. The tools (the "hands")
The agent needs tools to interact with the world. We'll give it two:
- Vector Store Retriever: The tool we built in previous lessons to search our internal
product_docs. - Web Search: A tool that can search the live internet (e.g., using
DuckDuckGoSearchRun()).
3. The Grader (the quality control)
This is the most important node. After retrieving documents, the "Grader" node checks if they are actually good enough to answer the question. This is our self-correction loop.
# The Grader's logic
def grade_documents(question, documents):
prompt = f"""
You are a grader. Your task is to determine if the
retrieved documents are relevant and sufficient to
answer the user's question.
Respond with a single word: 'yes' or 'no'.
Documents: {documents}
Question: {question}
"""
# LLM call will return "yes" or "no"
return call_llm(prompt)
4. The Generator (the voice)
This is the final LLM call we're familiar with. It takes the high-quality, graded context and synthesizes the final answer.
The Self-Correcting Agent in Action
Now, let's run our two queries through this new agentic graph.
Query 1: The failing comparative question
Query: "How does the Model-V compare to the new Model-Z?"
-
Router: Sees "compare" and "Model-Z." Decides:
web_search. -
Web Search: Runs a DuckDuckGo search for "Model-V vs Model-Z." Finds snippets about both.
-
Generate: The LLM gets the web search results (context about both models) and synthesizes a complete answer.
-
Final Answer: "Model-V features a 5-trillion parameter architecture, while web sources indicate Model-Z has a 3-trillion parameter architecture but a faster 'Photonic' core..."
Success! The agent dynamically chose the right tool.
What if the router had made a mistake?
-
Router: (Mistakenly) decides:
vectorstore. -
Retrieve: Finds docs about "Model-V" only.
-
Grade Documents: The grader looks at the docs and the question. It sees info for Model-V but nothing for Model-Z. Decides:
no. -
The Loop: The "no" decision routes the agent back to the
web_searchnode. -
Web Search: Runs the search, finds the missing info.
-
Generate: Synthesizes the complete answer.
This self-correction makes the system incredibly robust, even if one of its components makes a mistake.
Query 2: The simple internal question
Query: "Tell me about the processing core of Model-V."
-
Router: Sees "processing core" and "Model-V." Decides:
vectorstore. -
Retrieve: Finds the internal doc: "Built on a unique 'Quantum Entanglement' processing core..."
-
Grade Documents: The grader sees the doc is perfectly relevant. Decides:
yes. -
Generate: The LLM synthesizes the answer from the retrieved doc.
-
Final Answer: "The Model-V uses a unique 'Quantum Entanglement' processing core..."
The agent correctly and efficiently answered the question using only internal data, avoiding an unnecessary and slower web search.
Key takeaways
- Graphs > chains for complexity: Linear chains are fine for simple, fixed tasks. Agentic graphs (using tools like LangGraph) are required for systems that must make decisions, handle branching logic, and recover from errors.
- Routing adds efficiency: An intelligent router stops the agent from wasting time and money (API calls) searching in the wrong places.
- Self-correction adds reliability: By grading its own work, the agent can identify its own failures (bad retrieval) and take corrective action (like falling back to a web search).
- Agents are systems, not prompts: This approach shifts our thinking from just "prompt engineering" to "systems engineering." We are building a logical, stateful system where the LLM is just one (very smart) component.
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.