Advanced RAG with Tools: Using LlamaIndex
In our last lessons, we built a RAG system from the ground up. We had to manage every single step: loading, chunking, embedding, and storing.
This was a great way to learn, but it's like building a car by mining the iron ore yourself. For real-world projects, it's smarter to start with a pre-built engine and chassis.
This is the key insight: Frameworks like LlamaIndex act as a powerful toolkit for RAG. They provide pre-built, optimized components for everything, letting us build sophisticated systems much faster.
Part 1: The easy button — simple PDF ingestion
Remember all the steps we took to manually chunk, embed, and store our text? LlamaIndex handles complex files like PDFs in just two lines.
It bundles a PDF loader, a smart text splitter, an embedding model, and a vector store into one simple command.
# LlamaIndex abstracts away all the complexity
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
# 1. Load all documents from a folder (it handles PDFs, .txt, etc.)
documents = SimpleDirectoryReader("./data").load_data()
# 2. Build the index. This automatically handles:
# - Chunking
# - Embedding
# - Storing in a vector index
book_index = VectorStoreIndex.from_documents(documents)
# 3. Get a query engine and ask a question
query_engine = book_index.as_query_engine()
response = query_engine.query("What is the main character's goal?")
What took us dozens of lines before is now done in two. This abstraction is the primary power of using a framework.
Part 2: Reasoning over multiple books
What if the answer isn't in one document? What if it's spread across two different data sources?
For example, you have:
- Book Index: The full text of a novel.
- Quotes Index: A separate document with just famous quotes.
A user asks: "What is a famous quote by Dumbledore about dreams, and what is the context of dreams in the book?"
A simple RAG system would fail. We need an engine that can:
- Break the question into sub-questions.
- Route each sub-question to the correct tool (the right "book").
- Combine the answers.
LlamaIndex calls this a Sub-Question Query Engine.
graph TD
A["Complex Query: 'Plot? and Quote?'"] --> B(Sub-Question Engine)
B --> C["Sub-Question 1: 'Plot?'"]
B --> D["Sub-Question 2: 'Quote?'"]
C --> E["Tool 1: Book Index"]
D --> F["Tool 2: Quotes Index"]
E & F --> G(Synthesize Final Answer)
We give each of our indexes a "tool" wrapper with a clear description:
# 1. Create a tool for each of our "books"
book_tool = QueryEngineTool.from_defaults(
query_engine=book_query_engine,
name="Book_Content",
description="Useful for answering questions about the plot, characters, and events in the book."
)
quotes_tool = QueryEngineTool.from_defaults(
query_engine=quotes_query_engine,
name="Famous_Quotes",
description="Useful for finding specific quotes from Albus Dumbledore."
)
# 2. Build the Sub-Question Engine
# This engine uses the tool descriptions to route questions
sub_question_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=[book_tool, quotes_tool]
)
# 3. Ask the complex question
response = sub_question_engine.query(
"What is a Dumbledore quote about dreams, and what is the context of dreams in the book?"
)
The engine uses the tool descriptions to figure out where to find each piece of the answer. It asks the Famous_Quotes tool for the quote and the Book_Content tool for the context, then combines them.
Part 3: The ReAct Agent — Thinking and Acting
This is the most advanced step. What if the answer isn't in any of our local documents?
We need to give our system a new tool: Web Search.
But now the system has to choose which tool to use. For this, we build a ReAct Agent. "ReAct" stands for a Reason → Act → Observe loop. It allows the LLM to "think" about a plan.
graph TD
A[User Query] --> B("Reason: 'I need to find X'")
B --> C("Act: 'I will use Tool Y'")
C -- "Tool (e.g., Web Search)" --> D("Observe: 'Here's the result'")
D --> E{Is question answered?}
E -- No --> B
E -- Yes --> F[Final Answer]
The agent's "thinking" process looks like this:
- Reason: The user is asking for a real-world fact, not in my book.
- Act: I will choose the
Web_Searchtool. - Observe: The web search gives me the fact.
- Reason: I now have the answer.
- Act: I will generate the final response.
We build this by giving the agent a list of all its available tools:
# 1. Create a new tool for web search
web_search_tool = FunctionTool.from_defaults(
fn=DuckDuckGoSearchRun().run,
name="Web_Search",
description="Useful for searching the web for information not in the local documents."
)
# 2. Create a *new* list of ALL tools
all_tools = [book_tool, quotes_tool, web_search_tool]
# 3. Build the ReAct Agent
agent = ReActAgent.from_tools(all_tools, verbose=True)
# 4. Ask a question that REQUIRES web search
response = agent.chat(
"What was the real-world historical inspiration for the Philosopher's Stone?"
)
The agent (with verbose=True) will show its "thoughts." It will first check the Book_Content tool, fail to find the answer, and then decide to use the Web_Search tool to find the real-world history of alchemy, all in one "run."
Key takeaways
- Frameworks accelerate development: Tools like LlamaIndex handle the "plumbing" of RAG (loading, chunking, embedding), letting you focus on building high-level logic
- Sub-questioning solves multi-doc RAG: The
SubQuestionQueryEngineis a powerful way to break down complex questions and get answers from multiple different data sources - Agents need tools to be powerful: A RAG system limited to its own documents is a "closed book." By giving an agent tools (like web search), you create an "open-book" system that can answer a much wider range of questions
- ReAct is a core thinking loop: The Reason → Act → Observe cycle is how you build agents that can reason, make decisions, and use different tools to solve a problem
For more on building production AI systems, check out our AI Bootcamp for Software Engineers.