What's the time commitment for this bootcamp?

The bootcamp requires 10 hours per week over 6 weeks. This includes live sessions, hands-on projects, and self-paced learning. Most students find this manageable alongside their full-time jobs.

Do I need prior AI experience to join?

No prior AI experience is required, but you should have few years of software development experience. The bootcamp is designed for software engineers who want to upskill in AI engineering.

What if I can't make a live session?

All live sessions are recorded and available for replay. We also offer multiple office hours throughout the week, so you can catch up on any missed content or get help with assignments.

How much should I budget for APIs and resources?

We estimate €10-50 for the entire bootcamp, covering API costs for OpenAI and other services. We'll show you how to optimize costs and use free tiers when possible.

What happens if I can't attend this cohort?

You can defer to the next cohort at no additional cost. We run cohorts every 2-3 months, so you won't have to wait long to join.

How long will I have access to materials after the bootcamp?

You'll have lifetime access to all course materials, recordings, and the private community. This includes future updates and new content we add to the bootcamp.

What's the refund policy?

Yes, you can get a 100% refund if you've progressed less than 10% of the bootcamp or it's within 7 days of your purchase. We're confident in our curriculum and instructor quality, which is why we offer this guarantee.

Do you offer team discounts?

Yes! We offer 20%+ discounts for teams of 3 or more. Contact us at param@learnwithparam.com for team pricing and bulk enrollment options.

The 'Brain' of RAG: A Guide to Embeddings & Vector Databases

In our last posts, we've built RAG pipelines (see our introduction to RAG), chosen frameworks (see our RAG framework comparison), and even designed agents (see our agent framework comparison). We've used terms like "embeddings" and "vector databases" as if they're magic boxes.

This post is for you if you've ever stopped and asked, "But how does it actually work? How does a computer 'find' the right chunk of text?"

Understanding this is the single biggest step you can take from being an AI user to an AI engineer. We're going to open the black box, brick by brick, and see how the "brain" of RAG really thinks.

The Core Problem: Computers can't read "meaning"

Let's start with a simple problem. A user searches your knowledge base for "king".

A traditional "keyword" search (like Ctrl+F) will find this:

"The king sat on the throne."
"I am king of the world!"

But it will miss this:

"The queen ruled the land."
"A monarch's duty is to their people."
"His majesty entered the court."

To a computer, the strings "king" and "queen" are as different as "apple" and "banana". It has no concept of "royalty" or "meaning". This is the failure of keyword search.

To build a "smart" RAG, we need to solve two problems:

The "Translation" Problem: How do we translate text "meaning" into a format (numbers) that a computer can understand?
The "Search" Problem: Once we have millions of documents in this number format, how do we search them instantly?

1. The "Translation" (Understanding Embeddings)

This is the first piece of the puzzle. We solve the translation problem with a special type of AI model called an Embedding Model.

An embedding model is a "translator". It has been trained on billions of sentences, and its only job is to read a piece of text and convert its "meaning" into a list of numbers called a vector.

Think of it as a "coordinate" on a giant map of meaning.

The text "king" might be translated to the vector [0.1, 0.8, -0.2, ...]
The text "queen" might be [0.2, 0.7, -0.1, ...]
The text "apple" might be [-0.9, 0.1, 0.5, ...]

When these vectors are plotted, "king" and "queen" will be extremely close to each other, while "apple" will be on the other side of the map.

graph TD
    A["Text: 'king'"] --> B[Embedding Model]
    B --> C["Vector: [0.1, 0.8, -0.2, ...]"]
    
    D["Text: 'queen'"] --> B
    B --> E["Vector: [0.2, 0.7, -0.1, ...]"]
    
    F["Text: 'apple'"] --> B
    B --> G["Vector: [-0.9, 0.1, 0.5, ...]"]

Making it Real: How to create an Embedding

You don't need to train this model. You just use an open-source one, like sentence-transformers. For more on embedding models, see our vector databases guide.

from sentence_transformers import SentenceTransformer

# 1. Load a pre-trained "translator" model
# This model converts text into a 384-dimension vector
model = SentenceTransformer('all-MiniLM-L6-v2')

# 2. Define our sentences
sentences = [
    "The king sat on the throne.",
    "The queen ruled the land.",
    "I ate a red apple."
]

# 3. "Encode" them into vectors
embeddings = model.encode(sentences)

print(f"Shape of our embeddings: {embeddings.shape}")
# Output: Shape of our embeddings: (3, 384)
# This means we have 3 vectors, each 384 numbers long.

# Let's see the "distance" between them
from sklearn.metrics.pairwise import cosine_similarity

# Compare "king" and "queen"
sim_king_queen = cosine_similarity([embeddings[0]], [embeddings[1]])
# Compare "king" and "apple"
sim_king_apple = cosine_similarity([embeddings[0]], [embeddings[2]])

print(f"King vs. Queen Similarity: {sim_king_queen[0][0]:.4f}")
print(f"King vs. Apple Similarity: {sim_king_apple[0][0]:.4f}")

Observation:

When you run this, you'll see:

King vs. Queen Similarity: 0.7588 (A very high score!)
King vs. Apple Similarity: 0.1044 (A very low score!)

We have mathematically proven that the model "understands" that "king" and "queen" are related. This is the magic of semantic search.

Think About It: An embedding model is the "translator" that turns all your documents into a massive list of "coordinates". Now we have a new problem: if you have 10 million documents (10 million vectors), how do you find the closest one to your query?

2. The "Phone Book" Problem (Why `for` loops fail)

So, we have our 10 million document vectors. A user asks a question.

We translate the user's question into a query_vector.
We now have to find the closest document vector to our query vector.

The "naive" or "brute-force" way to do this is a simple for loop.

# The "Brute Force" way. DO NOT DO THIS.
def find_closest_vector(query_vector, all_document_vectors):
    best_similarity = -1
    best_document = None
    
    # This loop is the problem
    for doc_vector in all_document_vectors:
        # This one calculation is fast...
        similarity = cosine_similarity(query_vector, doc_vector)
        
        if similarity > best_similarity:
            best_similarity = similarity
            best_document = doc_vector
            
    return best_document

# If all_document_vectors has 10,000,000 items,
# this loop will take... hours.

This is a "Full Scan" or Exact Nearest Neighbor (ENN) search. It is 100% accurate, but it is impossibly slow. It's like finding a phone number by reading the entire phone book, line by line.

We need a "GPS".

3. The "GPS" (Understanding Vector Databases)

A Vector Database (like Chroma, Qdrant, or Pinecone) is a specialized tool built to do one thing: solve the "phone book" problem instantly. For choosing the right vector database, see our vector database comparison guide.

It doesn't use a for loop. It uses a magic trick called Approximate Nearest Neighbor (ANN) search.

The ANN algorithm is a "shortcut". Instead of checking all 10 million vectors, it builds a smart "map" of your data ahead of time. A popular algorithm is HNSW (Hierarchical Navigable Small Worlds).

Here's a simple analogy for how HNSW works:

The "Map": When you add your 10 million vectors, the database builds a multi-layered graph. It's like creating a "Country" layer, a "State" layer, a "City" layer, and a "Street" layer.
The "Search": When your query_vector ("find 'king'") comes in:
- It starts at the "Country" layer (e.g., "Food" vs. "History" vs. "People"). It finds the closest "country" is "People".
- It drops down to the "State" layer within "People" (e.g., "Politics" vs. "Art" vs. "Science"). It finds the closest "state" is "Politics".
- It drops down to the "City" layer within "Politics" (e.g., "Elections" vs. "Royalty"). It finds "Royalty".
- It drops to the "Street" layer and quickly scans the 50 vectors on that "street" to find the exact closest one: "queen".

Instead of 10,000,000 comparisons, it only did about 30.

Full Scan (Slow):

graph TD
    A[Query] --> B[1?]
    B --> C[2?]
    C --> D[3?]
    D --> E[...]
    E --> F[10,000,000?]
    F --> G[Answer]
    
    style A fill:#ffebee,stroke:#b71c1c
    style G fill:#ffebee,stroke:#b71c1c

ANN Search (Fast):

graph TD
    H[Query] --> I[Country Layer]
    I --> J[State Layer]
    J --> K[City Layer]
    K --> L[Street Layer]
    L --> M[Scan 50 vectors]
    M --> N[Answer]
    
    style H fill:#e8f5e9,stroke:#388e3c
    style N fill:#e8f5e9,stroke:#388e3c

This is why it's called "Approximate". It's possible the perfect answer was on a different "street" in a different "city". But it's 99.9% likely to find a good enough answer in milliseconds instead of hours.

Putting it all together: The full RAG flow

Now we can see the full picture. Our two key components—the Embedding Model and the Vector Database—work together to power our RAG system.

graph TD
    A[Your Doc .pdf] --> B[1. Chunking]
    B --> C[Chunk 1]
    B --> D[Chunk 2]
    B --> E[Chunk 3...]
    
    C --> F[2. Embedding Model<br/>The Translator]
    D --> F
    E --> F
    
    F --> G[Vector 1]
    F --> H[Vector 2]
    F --> I[Vector 3...]
    
    subgraph VDB["3. Vector Database e.g., Chroma"]
        direction LR
        J[ANN Index<br/>The Map]
        G --> J
        H --> J
        I --> J
    end
    
    K[User Query: Find...] --> L[2. Embedding Model<br/>The Translator]
    L --> M[Query Vector]
    
    M -- "4. Search" --> J
    J -- "5. Retrieve" --> N[Relevant Chunks]
    
    N --> O[6. LLM]
    K --> O
    O --> P[Final Answer]

Ingestion (One-time): We "chunk" our documents (see our chunking guide), "translate" each chunk into a vector using the Embedding Model, and store these vectors in the Vector Database, which builds its fast "map".
Querying (Real-time): The user's query is "translated" by the same Embedding Model. The Vector Database uses its fast ANN search ("GPS") to find the closest document vectors. These vectors' corresponding text chunks are retrieved and given to the LLM.

Observation:

The Embedding Model defines the quality of your search. A better model creates a better "map".
The Vector Database defines the speed of your search. A better database searches the "map" faster.

You cannot have a good RAG system without both.

Challenge for You

Use Case: You are building a RAG system for a movie database.
The Goal: You want a search for "fast cars and explosions" to return the Fast & Furious movies, even if the movie descriptions don't use those exact words.
Your Task:
- Which component is responsible for "understanding" that the meaning of "fast cars and explosions" is semantically close to the meaning of "a film about street racing and heists"?
- Which component is responsible for searching through 1 million movie descriptions in under 50 milliseconds?

Key takeaways

Embeddings translate meaning into math: Embedding models convert text into numerical vectors that capture semantic relationships, enabling computers to understand meaning
Vector databases solve the search problem: ANN algorithms like HNSW enable fast approximate nearest neighbor search across millions of vectors
Both components are essential: The embedding model determines search quality, while the vector database determines search speed
ANN trade-offs: Approximate search sacrifices perfect accuracy for massive speed improvements (milliseconds vs. hours)
Understanding the fundamentals: Knowing how embeddings and vector databases work is crucial for building and optimizing production RAG systems

For more on building production AI systems, check out our AI Bootcamp for Software Engineers.

The 'Brain' of RAG: A Guide to Embeddings & Vector Databases

Share this post

The Core Problem: Computers can't read "meaning"

1. The "Translation" (Understanding Embeddings)

Making it Real: How to create an Embedding

2. The "Phone Book" Problem (Why `for` loops fail)

3. The "GPS" (Understanding Vector Databases)

Putting it all together: The full RAG flow

Challenge for You

Key takeaways

Share this post

Continue Reading

How to Choose Your Vector Database

Batteries-Included RAG Platforms: Dify vs. RAGFlow vs. Onyx

Choosing Your RAG Framework: LangChain vs. LlamaIndex vs. Haystack

Choosing the Right LLM for Each Task: From Nano to MoE

AI Engineering in Practice: Building an AI Bedtime Story Generator

The 'Brain' of RAG: A Guide to Embeddings & Vector Databases

Share this post

Share this post

Continue Reading

How to Choose Your Vector Database

Batteries-Included RAG Platforms: Dify vs. RAGFlow vs. Onyx

Choosing Your RAG Framework: LangChain vs. LlamaIndex vs. Haystack

Choosing the Right LLM for Each Task: From Nano to MoE

AI Engineering in Practice: Building an AI Bedtime Story Generator

Weekly Bytes of AI — Newsletter by Param