Advanced RAG with Tools: Using LlamaIndex

Param Harrison
5 min read

Share this post

In our last lessons, we built a RAG system from the ground up. We had to manage every single step: loading, chunking, embedding, and storing.

This was a great way to learn, but it's like building a car by mining the iron ore yourself. For real-world projects, it's smarter to start with a pre-built engine and chassis.

This is the key insight: Frameworks like LlamaIndex act as a powerful toolkit for RAG. They provide pre-built, optimized components for everything, letting us build sophisticated systems much faster.

Part 1: The easy button — simple PDF ingestion

Remember all the steps we took to manually chunk, embed, and store our text? LlamaIndex handles complex files like PDFs in just two lines.

It bundles a PDF loader, a smart text splitter, an embedding model, and a vector store into one simple command.

# LlamaIndex abstracts away all the complexity
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

# 1. Load all documents from a folder (it handles PDFs, .txt, etc.)
documents = SimpleDirectoryReader("./data").load_data()

# 2. Build the index. This automatically handles:
#    - Chunking
#    - Embedding
#    - Storing in a vector index
book_index = VectorStoreIndex.from_documents(documents)

# 3. Get a query engine and ask a question
query_engine = book_index.as_query_engine()
response = query_engine.query("What is the main character's goal?")

What took us dozens of lines before is now done in two. This abstraction is the primary power of using a framework.

Part 2: Reasoning over multiple books

What if the answer isn't in one document? What if it's spread across two different data sources?

For example, you have:

  1. Book Index: The full text of a novel.
  2. Quotes Index: A separate document with just famous quotes.

A user asks: "What is a famous quote by Dumbledore about dreams, and what is the context of dreams in the book?"

A simple RAG system would fail. We need an engine that can:

  1. Break the question into sub-questions.
  2. Route each sub-question to the correct tool (the right "book").
  3. Combine the answers.

LlamaIndex calls this a Sub-Question Query Engine.

graph TD
    A["Complex Query: 'Plot? and Quote?'"] --> B(Sub-Question Engine)
    B --> C["Sub-Question 1: 'Plot?'"]
    B --> D["Sub-Question 2: 'Quote?'"]
    C --> E["Tool 1: Book Index"]
    D --> F["Tool 2: Quotes Index"]
    E & F --> G(Synthesize Final Answer)

We give each of our indexes a "tool" wrapper with a clear description:

# 1. Create a tool for each of our "books"
book_tool = QueryEngineTool.from_defaults(
    query_engine=book_query_engine,
    name="Book_Content",
    description="Useful for answering questions about the plot, characters, and events in the book."
)

quotes_tool = QueryEngineTool.from_defaults(
    query_engine=quotes_query_engine,
    name="Famous_Quotes",
    description="Useful for finding specific quotes from Albus Dumbledore."
)

# 2. Build the Sub-Question Engine
# This engine uses the tool descriptions to route questions
sub_question_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[book_tool, quotes_tool]
)

# 3. Ask the complex question
response = sub_question_engine.query(
    "What is a Dumbledore quote about dreams, and what is the context of dreams in the book?"
)

The engine uses the tool descriptions to figure out where to find each piece of the answer. It asks the Famous_Quotes tool for the quote and the Book_Content tool for the context, then combines them.

Part 3: The ReAct Agent — Thinking and Acting

This is the most advanced step. What if the answer isn't in any of our local documents?

We need to give our system a new tool: Web Search.

But now the system has to choose which tool to use. For this, we build a ReAct Agent. "ReAct" stands for a Reason → Act → Observe loop. It allows the LLM to "think" about a plan.

graph TD
    A[User Query] --> B("Reason: 'I need to find X'")
    B --> C("Act: 'I will use Tool Y'")
    C -- "Tool (e.g., Web Search)" --> D("Observe: 'Here's the result'")
    D --> E{Is question answered?}
    E -- No --> B
    E -- Yes --> F[Final Answer]

The agent's "thinking" process looks like this:

  1. Reason: The user is asking for a real-world fact, not in my book.
  2. Act: I will choose the Web_Search tool.
  3. Observe: The web search gives me the fact.
  4. Reason: I now have the answer.
  5. Act: I will generate the final response.

We build this by giving the agent a list of all its available tools:

# 1. Create a new tool for web search
web_search_tool = FunctionTool.from_defaults(
    fn=DuckDuckGoSearchRun().run,
    name="Web_Search",
    description="Useful for searching the web for information not in the local documents."
)

# 2. Create a *new* list of ALL tools
all_tools = [book_tool, quotes_tool, web_search_tool]

# 3. Build the ReAct Agent
agent = ReActAgent.from_tools(all_tools, verbose=True)

# 4. Ask a question that REQUIRES web search
response = agent.chat(
    "What was the real-world historical inspiration for the Philosopher's Stone?"
)

The agent (with verbose=True) will show its "thoughts." It will first check the Book_Content tool, fail to find the answer, and then decide to use the Web_Search tool to find the real-world history of alchemy, all in one "run."

Key takeaways

  • Frameworks accelerate development: Tools like LlamaIndex handle the "plumbing" of RAG (loading, chunking, embedding), letting you focus on building high-level logic
  • Sub-questioning solves multi-doc RAG: The SubQuestionQueryEngine is a powerful way to break down complex questions and get answers from multiple different data sources
  • Agents need tools to be powerful: A RAG system limited to its own documents is a "closed book." By giving an agent tools (like web search), you create an "open-book" system that can answer a much wider range of questions
  • ReAct is a core thinking loop: The Reason → Act → Observe cycle is how you build agents that can reason, make decisions, and use different tools to solve a problem

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

Share this post

Continue Reading

Weekly Bytes of AI — Newsletter by Param

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.