Vector Databases Explained
- Published on
- Authors
- Author
- Ram Simran G
- twitter @rgarimella0124
What’s the Problem with Regular Search?
- You have 10,000 product descriptions
- User types: “comfortable outdoor furniture”
- Traditional database (SQL/NoSQL):
- Looks for exact words: “comfortable” OR “outdoor” OR “furniture”
- Misses “cozy patio seating” — even though it’s the same thing
- Keyword matching = dumb
Vector databases fix this with meaning, not just words.
So, What Is a Vector Database?
A vector database stores numbers that represent meaning, not just text.
| Regular Database | Vector Database |
|---|---|
Stores: "cozy patio seating" | Stores: [0.3, 0.8, 0.1, 0.9, ...] |
| Searches: exact words | Searches: similar meanings |
- These numbers = embeddings (created by AI models like OpenAI, Google, etc.)
- Similar ideas → similar numbers
- “Cozy” and “comfortable” → close numbers
- “Chair” and “table” → far apart
How Does It Work? (Step-by-Step)
Step 1: Turn Text into Numbers (Vectors)
"comfortable chair" → [0.2, 0.7, 0.1, 0.4, ...]
"cozy seat" → [0.3, 0.8, 0.2, 0.5, ...] - Done using AI embedding models (like OpenAI’s
text-embedding-3-small) - Same meaning = close numbers
Step 2: Store & Index These Vectors
- Not stored as plain text
- Stored as arrays of numbers
- Special indexes (like HNSW, IVF) make “find similar” super fast
Step 3: Search by Meaning
User searches: "outdoor furniture"
→ Converted to vector: [0.3, 0.6, 0.2, 0.8, ...]
→ Database finds closest matches using math (cosine similarity)
→ Returns: "cozy patio seating", "garden lounge set", etc. Vector DB vs SQL vs NoSQL: Key Differences
| Feature | SQL (e.g. PostgreSQL) | NoSQL (e.g. MongoDB) | Vector DB |
|---|---|---|---|
| Stores | Rows with columns | JSON documents | Vectors (numbers) |
| Search | Exact match, filters | Text search, regex | Similarity (meaning) |
| Best for | Transactions, reports | Flexible data | AI search, recommendations |
| Speed at scale | Slow for similarity | Not built for it | Blazing fast similarity |
Think of it like this:
SQL = phone book (exact name lookup)
Vector DB = friend who “knows someone like that”
Real-World Use Cases (You’re Already Using These!)
- Smart product search
→ Finds “cozy patio” when user types “comfy outdoor” - Chatbots & support
→ Matches “How do I reset?” to “password recovery guide” - Recommendation engines
→ “Users who liked X also liked Y” (based on behavior vectors) - Document search
→ Finds relevant policies even if keywords differ - Image/audio search
→ Find similar images or songs by content
How Data Gets Added (Behind the Scenes)
(From the diagram in the tweet)
- User sends object → e.g., movie:
{title: "Top Gun", genre: "action"} - System generates vector → using AI model (e.g., OpenAI)
- Vector + metadata stored → in collection + indexes
- Inverted index updated → for fast filtering (e.g.,
genre = action) - Vector index updated → for similarity search
- Object ID returned → UUID like
a1b2c3...
All this happens in parallel — super fast!
Popular Vector Databases (Pick Your Flavor)
| Name | Type | Best For |
|---|---|---|
| Weaviate | Open-source | Feature-rich, self-hosted |
| Pinecone | Managed (cloud) | Easy, no ops, pricey |
| Milvus | Open-source | Massive scale, complex |
| Qdrant | Open-source (Rust) | Fast, lightweight |
| pgvector | Postgres extension | Simple, use your existing DB |
Hot take: For more than 1 million items, just use PostgreSQL + pgvector.
No need for a fancy vector DB yet.
Do You Really Need a Vector Database?
| Project Size | Recommendation |
|---|---|
| Less than 100K items | Use Postgres + pgvector |
| 100K – 1M | Still fine with pgvector |
| More than 1M or heavy search | Consider Weaviate/Pinecone |
Start simple. Scale when you feel the pain.
TL;DR: Vector Databases in 5 Bullets
- Turn text → meaningful numbers (embeddings)
- Store & search by similarity, not keywords
- Perfect for AI search, recommendations, chatbots
- Different from SQL/NoSQL: math-based, not rule-based
- Start with pgvector — you probably don’t need more
Want to try it today?
-- In PostgreSQL with pgvector
CREATE EXTENSION vector;
CREATE TABLE products (id serial, description text, embedding vector(1536));
-- Insert
INSERT INTO products (description, embedding)
VALUES ('cozy patio seating', '[0.3,0.8,...]');
-- Search
SELECT * FROM products
ORDER BY embedding <=> '[0.2,0.7,...]' -- your query vector
LIMIT 5; That’s it. You’re now doing AI-powered search.
MongoDB Also Does Vector Search! (Atlas Vector Search)
Yes! MongoDB added native vector search in MongoDB Atlas (cloud version).
Why Use MongoDB for Vectors?
- You already use MongoDB? → No new database
- Store documents + vectors together
- Full-text + vector search in one query
- Great for apps with rich metadata
MongoDB Vector Search – Basic Implementation (Python using pymongo)
# MongoDB Vector Search –
from pymongo import MongoClient
# 1. Connect to MongoDB Atlas (replace <connection_string>)
client = MongoClient("mongodb+srv://<user>:<password>@cluster0.mongodb.net/")
db = client["store"]
collection = db["products"]
# 2. Insert a document with embedding
collection.insert_one({
"description": "cozy patio seating",
"price": 299,
"category": "outdoor",
"embedding": [0.3, 0.8, 0.1, ..., 0.9] # 1536-dim vector (example)
})
# 3. Search using vector similarity
pipeline = [
{
"$vectorSearch": {
"index": "vector_index", # Name of your vector search index
"path": "embedding",
"queryVector": [0.2, 0.7, 0.1, ..., 0.4], # Embedding of user query
"numCandidates": 100,
"limit": 5
}
},
{
"$project": {
"description": 1,
"price": 1,
"category": 1,
"score": { "$meta": "vectorSearchScore" }
}
}
]
results = list(collection.aggregate(pipeline))
# Print results
for doc in results:
print(f"{doc['description']} | Score: {doc['score']:.4f}") Note:
- You must create a Vector Search Index in MongoDB Atlas first (via UI or CLI)
- Use
pip install pymongoto install the driver- Replace the placeholder vector with real 1536-dim embeddings (e.g., from OpenAI)
pgvector vs MongoDB Atlas Vector Search: Quick Comparison
| Feature | PostgreSQL + pgvector | MongoDB Atlas Vector Search |
|---|---|---|
| Open source | Yes | No (cloud only) |
| Self-hostable | Yes | No |
| Works with existing data | Yes | Yes |
| Full-text + vector in one query | Yes (with tsvector) | Yes |
| Free tier | Yes | Yes (limited) |
| Best for | Small–medium apps, full control | Apps already on MongoDB, rapid prototyping |
Rule of thumb:
Use pgvector if you want free, open, self-hosted
Use MongoDB Atlas if you’re already in the MongoDB ecosystem
Cheers,
Sim