Vector Databases and Embeddings: The Brain of RAG

Param Harrison
7 min read

Share this post

In our last posts, we learned how to build a RAG pipeline and how to "chunk" documents for it. Now, we'll explore the magic at its core: embeddings and vector databases.

The fundamental idea: the magic library

Imagine a normal library. Books are organized alphabetically by title or author. If you want to find a book about "cats," you look under "C." This is keyword search.

Now, imagine a magic library. Books are organized by their meaning. A book titled Feline Friends is right next to a book titled The Lion's Roar. A sci-fi novel about Mars is next to a textbook on rocket science.

If you ask this magic librarian for "ways to leave the planet," they can instantly point you to this entire section. This is semantic search.

  • Embeddings are the "magic coordinates" that give every piece of text a physical location in this library based on its meaning.
  • A Vector Database is the magic library itself, a building designed to find the closest coordinates to your question instantly.

Part 1: What are embeddings?

An embedding is a vector (a long list of numbers) that represents the semantic meaning of a piece of data. We use a special "embedding model" to turn our text chunks into these vectors.

Open-source embeddings (local & free)

Models from libraries like sentence-transformers run locally on your machine. They are free, private, and give you full control.

from sentence_transformers import SentenceTransformer

# Load a model that runs locally on your machine
oss_embedding_model = SentenceTransformer('all-MiniLM-L6-v2')

sentence = "A feline rested comfortably on the rug."

oss_embedding = oss_embedding_model.encode(sentence)

print(f"Model: all-MiniLM-L6-v2")
print(f"Embedding Dimensions: {oss_embedding.shape}")
print(f"First 5 values: {oss_embedding[:5]}")

# Output:
# Model: all-MiniLM-L6-v2
# Embedding Dimensions: (384,)
# First 5 values: [ 0.0346 -0.0163  0.0349  0.0526 -0.0215]

Proprietary embeddings (API & paid)

Models from providers like OpenAI are accessed via an API. They are often larger and more powerful, but you pay per use and send your data to a third party.

from openai import OpenAI
llm_client = OpenAI(api_key="...")

def get_openai_embedding(text, model="text-embedding-3-small"):
   # Helper function to call the OpenAI API
   text = text.replace("\n", " ")
   return llm_client.embeddings.create(input=[text], model=model).data[0].embedding

sentence = "A feline rested comfortably on the rug."

openai_embedding = get_openai_embedding(sentence)

print(f"Model: text-embedding-3-small")
print(f"Embedding Dimensions: {len(openai_embedding)}")
print(f"First 5 values: {openai_embedding[:5]}")

# Output:
# Model: text-embedding-3-small
# Embedding Dimensions: 1536
# First 5 values: [-0.0152, -0.0205, 0.0078, -0.0461, 0.0026]

Notice the dimensions (the length of the number list) are different: 384 vs. 1536. This is a key trade-off between model size, cost, and quality.

How embeddings capture "meaning"

So, they're lists of numbers. How does that help? We can prove they capture meaning by measuring the "distance" between them. Semantically similar sentences will have vectors that are mathematically "close" to each other.

The most common way to measure this is Cosine Similarity.

from sklearn.metrics.pairwise import cosine_similarity
import numpy as np

# 1. Define three sentences
target = "The cat sat on the mat."
similar = "A feline relaxed on the rug."
dissimilar = "The rocket launched into space."

# 2. Embed all three sentences
embeddings = oss_embedding_model.encode([target, similar, dissimilar])

# 3. Reshape for the similarity function
target_emb = embeddings[0].reshape(1, -1)
similar_emb = embeddings[1].reshape(1, -1)
dissimilar_emb = embeddings[2].reshape(1, -1)

# 4. Calculate similarity scores
sim_score = cosine_similarity(target_emb, similar_emb)[0][0]
dissim_score = cosine_similarity(target_emb, dissimilar_emb)[0][0]

print(f"Similarity (cat, feline): {sim_score:.4f}")
print(f"Similarity (cat, rocket): {dissim_score:.4f}")

# Output:
# Similarity (cat, feline): 0.8263
# Similarity (cat, rocket): 0.0816

The high score (0.82) proves the model "understands" that "cat" and "feline" are related. The low score (0.08) shows it knows "cat" and "rocket" are not. This is the simple, powerful math behind RAG's retriever.

Part 2: What are vector databases?

Now that we have millions of these vector embeddings, we need a special database to store them and search them instantly. This is a Vector Database.

It takes your query (e.g., "What did astronauts do?"), embeds it, and then searches its "magic library" for the document vectors with the closest coordinates.

Let's use ChromaDB, a popular, easy-to-use vector database.

import chromadb
chroma_client = chromadb.Client()

# 1. Create a "collection" (our library shelf)
collection = chroma_client.get_or_create_collection(name="history_facts")

# 2. Add documents (Chroma handles embedding them for us!)
documents = [
    "The Apollo 11 mission successfully landed the first humans on the Moon.",
    "The Hubble Space Telescope has provided some of the most detailed images of distant galaxies.",
    "The Great Wall of China is a series of fortifications stretching over 13,000 miles.",
    "The Roman Colosseum was used for gladiatorial contests and public spectacles."
]

collection.add(
    documents=documents,
    ids=["id_1", "id_2", "id_3", "id_4"] # Every document needs a unique ID
)

# 3. Query the collection
query = "What did astronauts do in space?"

results = collection.query(
    query_texts=[query],
    n_results=1 # Ask for the single best result
)

print(results['documents'])
# Output:
# [['The Apollo 11 mission successfully landed the first humans on the Moon.']]

The real power: Vector Search + Metadata Filtering

In a real application, you never just use semantic search alone. You combine it with traditional metadata filtering.

This is the difference between asking the magic librarian:

  • "Find me books about adventure." (Semantic search)
  • "Find me books about adventure, but only in the 'Sci-Fi' section, and only those with a 5-star rating." (Hybrid Search)

Vector databases are built for this.

movie_collection = chroma_client.get_or_create_collection(name="movie_reviews")

# 1. Add documents WITH metadata
reviews = [
    "A thrilling journey through space to save humanity.",
    "Two old friends embark on a road trip and rediscover their bond.",
    "In a galaxy far away, a hero rises to fight an evil empire.",
    "A heartwarming tale of a boy and his dog on a cross-country trek."
]

metadata = [
    {'genre': 'Sci-Fi', 'rating': 5},
    {'genre': 'Drama', 'rating': 4},
    {'genre': 'Sci-Fi', 'rating': 5},
    {'genre': 'Family', 'rating': 4}
]

movie_collection.add(
    documents=reviews,
    metadatas=metadata,
    ids=["review_1", "review_2", "review_3", "review_4"]
)

# 2. Define our query
query = "A story about friendship and adventure"

# 3. Run a PURE semantic search
# This will likely return the 'road trip' and 'boy and his dog' reviews.
semantic_results = movie_collection.query(
    query_texts=[query], 
    n_results=2
)
print(f"Semantic Search Results: {semantic_results['documents']}")

# 4. Run a FILTERED search
# This finds things that *sound like* our query, but *only* in the 'Sci-Fi' genre.
filtered_results = movie_collection.query(
    query_texts=[query],
    n_results=2,
    where={"genre": "Sci-Fi"}  # <-- This is the metadata filter!
)
print(f"Filtered Search Results: {filtered_results['documents']}")

# Output:
# Semantic Search Results: [['Two old friends embark on a road trip...', 'A heartwarming tale of a boy and his dog...']]
# Filtered Search Results: [['A thrilling journey through space to save humanity.', 'In a galaxy far away, a hero rises...']]

This is the "superpower" of a production-grade RAG system: combining fuzzy, meaning-based search with precise, rule-based filtering.

Key takeaways

  • Embeddings turn meaning into math: They are the bridge that allows computers to understand and compare the "meaning" of data
  • Model choice is a trade-off: Your choice of embedding model (open-source vs. proprietary) impacts cost, performance, and privacy
  • Vector DBs are the "magic library": They are specialized databases designed to store and search billions of embeddings at high speed
  • Metadata is crucial for production: Real-world RAG systems almost always combine semantic search (what it means) with metadata filtering (what it is)

For more on building production AI systems, check out our AI Engineering Bootcamp.

Share this post

Continue Reading

Weekly Bytes of AI — Newsletter by Param

Technical deep-dives for engineers building production AI systems.

Architecture patterns, system design, cost optimization, and real-world case studies. No fluff, just engineering insights.

Unsubscribe anytime. We respect your inbox.