Vector Databases Explained

What’s the Problem with Regular Search?

You have 10,000 product descriptions
User types: “comfortable outdoor furniture”
Traditional database (SQL/NoSQL):
- Looks for exact words: “comfortable” OR “outdoor” OR “furniture”
- Misses “cozy patio seating” — even though it’s the same thing
- Keyword matching = dumb

Vector databases fix this with meaning, not just words.

So, What Is a Vector Database?

A vector database stores numbers that represent meaning, not just text.

Regular Database	Vector Database
Stores: `"cozy patio seating"`	Stores: `[0.3, 0.8, 0.1, 0.9, ...]`
Searches: exact words	Searches: similar meanings

These numbers = embeddings (created by AI models like OpenAI, Google, etc.)
Similar ideas → similar numbers
“Cozy” and “comfortable” → close numbers
“Chair” and “table” → far apart

How Does It Work? (Step-by-Step)

Step 1: Turn Text into Numbers (Vectors)

"comfortable chair" → [0.2, 0.7, 0.1, 0.4, ...]
"cozy seat"         → [0.3, 0.8, 0.2, 0.5, ...]

Done using AI embedding models (like OpenAI’s text-embedding-3-small)
Same meaning = close numbers

Step 2: Store & Index These Vectors

Not stored as plain text
Stored as arrays of numbers
Special indexes (like HNSW, IVF) make “find similar” super fast

Step 3: Search by Meaning

User searches: "outdoor furniture"
→ Converted to vector: [0.3, 0.6, 0.2, 0.8, ...]
→ Database finds closest matches using math (cosine similarity)
→ Returns: "cozy patio seating", "garden lounge set", etc.

Vector DB vs SQL vs NoSQL: Key Differences

Feature	SQL (e.g. PostgreSQL)	NoSQL (e.g. MongoDB)	Vector DB
Stores	Rows with columns	JSON documents	Vectors (numbers)
Search	Exact match, filters	Text search, regex	Similarity (meaning)
Best for	Transactions, reports	Flexible data	AI search, recommendations
Speed at scale	Slow for similarity	Not built for it	Blazing fast similarity

Think of it like this:
SQL = phone book (exact name lookup)
Vector DB = friend who “knows someone like that”

Real-World Use Cases (You’re Already Using These!)

Smart product search
→ Finds “cozy patio” when user types “comfy outdoor”
Chatbots & support
→ Matches “How do I reset?” to “password recovery guide”
Recommendation engines
→ “Users who liked X also liked Y” (based on behavior vectors)
Document search
→ Finds relevant policies even if keywords differ
Image/audio search
→ Find similar images or songs by content

How Data Gets Added (Behind the Scenes)

(From the diagram in the tweet)

User sends object → e.g., movie: {title: "Top Gun", genre: "action"}
System generates vector → using AI model (e.g., OpenAI)
Vector + metadata stored → in collection + indexes
Inverted index updated → for fast filtering (e.g., genre = action)
Vector index updated → for similarity search
Object ID returned → UUID like a1b2c3...

All this happens in parallel — super fast!

Popular Vector Databases (Pick Your Flavor)

Name	Type	Best For
Weaviate	Open-source	Feature-rich, self-hosted
Pinecone	Managed (cloud)	Easy, no ops, pricey
Milvus	Open-source	Massive scale, complex
Qdrant	Open-source (Rust)	Fast, lightweight
pgvector	Postgres extension	Simple, use your existing DB

Hot take: For more than 1 million items, just use PostgreSQL + pgvector.
No need for a fancy vector DB yet.

Do You Really Need a Vector Database?

Project Size	Recommendation
Less than 100K items	Use Postgres + pgvector
100K – 1M	Still fine with pgvector
More than 1M or heavy search	Consider Weaviate/Pinecone

Start simple. Scale when you feel the pain.

TL;DR: Vector Databases in 5 Bullets

Turn text → meaningful numbers (embeddings)
Store & search by similarity, not keywords
Perfect for AI search, recommendations, chatbots
Different from SQL/NoSQL: math-based, not rule-based
Start with pgvector — you probably don’t need more

Want to try it today?

-- In PostgreSQL with pgvector
CREATE EXTENSION vector;
CREATE TABLE products (id serial, description text, embedding vector(1536));

-- Insert
INSERT INTO products (description, embedding)
VALUES ('cozy patio seating', '[0.3,0.8,...]');

-- Search
SELECT * FROM products
ORDER BY embedding <=> '[0.2,0.7,...]'  -- your query vector
LIMIT 5;

That’s it. You’re now doing AI-powered search.

MongoDB Also Does Vector Search! (Atlas Vector Search)

Yes! MongoDB added native vector search in MongoDB Atlas (cloud version).

Why Use MongoDB for Vectors?

You already use MongoDB? → No new database
Store documents + vectors together
Full-text + vector search in one query
Great for apps with rich metadata

MongoDB Vector Search – Basic Implementation (Python using pymongo)

# MongoDB Vector Search – 

from pymongo import MongoClient

# 1. Connect to MongoDB Atlas (replace <connection_string>)
client = MongoClient("mongodb+srv://<user>:<password>@cluster0.mongodb.net/")
db = client["store"]
collection = db["products"]

# 2. Insert a document with embedding
collection.insert_one({
    "description": "cozy patio seating",
    "price": 299,
    "category": "outdoor",
    "embedding": [0.3, 0.8, 0.1, ..., 0.9]  # 1536-dim vector (example)
})

# 3. Search using vector similarity
pipeline = [
    {
        "$vectorSearch": {
            "index": "vector_index",  # Name of your vector search index
            "path": "embedding",
            "queryVector": [0.2, 0.7, 0.1, ..., 0.4],  # Embedding of user query
            "numCandidates": 100,
            "limit": 5
        }
    },
    {
        "$project": {
            "description": 1,
            "price": 1,
            "category": 1,
            "score": { "$meta": "vectorSearchScore" }
        }
    }
]

results = list(collection.aggregate(pipeline))

# Print results
for doc in results:
    print(f"{doc['description']} | Score: {doc['score']:.4f}")

Note:

You must create a Vector Search Index in MongoDB Atlas first (via UI or CLI)

Use pip install pymongo to install the driver

Replace the placeholder vector with real 1536-dim embeddings (e.g., from OpenAI)

pgvector vs MongoDB Atlas Vector Search: Quick Comparison

Feature	PostgreSQL + pgvector	MongoDB Atlas Vector Search
Open source	Yes	No (cloud only)
Self-hostable	Yes	No
Works with existing data	Yes	Yes
Full-text + vector in one query	Yes (with `tsvector`)	Yes
Free tier	Yes	Yes (limited)
Best for	Small–medium apps, full control	Apps already on MongoDB, rapid prototyping

Rule of thumb:
Use pgvector if you want free, open, self-hosted
Use MongoDB Atlas if you’re already in the MongoDB ecosystem

Cheers,

Sim