Vector Databases Explained

Vector Databases Explained

Published on
Authors

What’s the Problem with Regular Search?

  • You have 10,000 product descriptions
  • User types: “comfortable outdoor furniture”
  • Traditional database (SQL/NoSQL):
    • Looks for exact words: “comfortable” OR “outdoor” OR “furniture”
    • Misses “cozy patio seating” — even though it’s the same thing
    • Keyword matching = dumb

Vector databases fix this with meaning, not just words.


So, What Is a Vector Database?

A vector database stores numbers that represent meaning, not just text.

Regular Database Vector Database
Stores: "cozy patio seating" Stores: [0.3, 0.8, 0.1, 0.9, ...]
Searches: exact words Searches: similar meanings
  • These numbers = embeddings (created by AI models like OpenAI, Google, etc.)
  • Similar ideas → similar numbers
  • “Cozy” and “comfortable” → close numbers
  • “Chair” and “table” → far apart

How Does It Work? (Step-by-Step)

Step 1: Turn Text into Numbers (Vectors)

"comfortable chair" → [0.2, 0.7, 0.1, 0.4, ...]
"cozy seat"         → [0.3, 0.8, 0.2, 0.5, ...]
  • Done using AI embedding models (like OpenAI’s text-embedding-3-small)
  • Same meaning = close numbers

Step 2: Store & Index These Vectors

  • Not stored as plain text
  • Stored as arrays of numbers
  • Special indexes (like HNSW, IVF) make “find similar” super fast

Step 3: Search by Meaning

User searches: "outdoor furniture"
→ Converted to vector: [0.3, 0.6, 0.2, 0.8, ...]
→ Database finds closest matches using math (cosine similarity)
→ Returns: "cozy patio seating", "garden lounge set", etc.

Vector DB vs SQL vs NoSQL: Key Differences

Feature SQL (e.g. PostgreSQL) NoSQL (e.g. MongoDB) Vector DB
Stores Rows with columns JSON documents Vectors (numbers)
Search Exact match, filters Text search, regex Similarity (meaning)
Best for Transactions, reports Flexible data AI search, recommendations
Speed at scale Slow for similarity Not built for it Blazing fast similarity

Think of it like this:
SQL = phone book (exact name lookup)
Vector DB = friend who “knows someone like that”


Real-World Use Cases (You’re Already Using These!)

  • Smart product search
    → Finds “cozy patio” when user types “comfy outdoor”
  • Chatbots & support
    → Matches “How do I reset?” to “password recovery guide”
  • Recommendation engines
    → “Users who liked X also liked Y” (based on behavior vectors)
  • Document search
    → Finds relevant policies even if keywords differ
  • Image/audio search
    → Find similar images or songs by content

How Data Gets Added (Behind the Scenes)

(From the diagram in the tweet)

  1. User sends object → e.g., movie: {title: "Top Gun", genre: "action"}
  2. System generates vector → using AI model (e.g., OpenAI)
  3. Vector + metadata stored → in collection + indexes
  4. Inverted index updated → for fast filtering (e.g., genre = action)
  5. Vector index updated → for similarity search
  6. Object ID returned → UUID like a1b2c3...

All this happens in parallel — super fast!


Popular Vector Databases (Pick Your Flavor)

Name Type Best For
Weaviate Open-source Feature-rich, self-hosted
Pinecone Managed (cloud) Easy, no ops, pricey
Milvus Open-source Massive scale, complex
Qdrant Open-source (Rust) Fast, lightweight
pgvector Postgres extension Simple, use your existing DB

Hot take: For more than 1 million items, just use PostgreSQL + pgvector.
No need for a fancy vector DB yet.


Do You Really Need a Vector Database?

Project Size Recommendation
Less than 100K items Use Postgres + pgvector
100K – 1M Still fine with pgvector
More than 1M or heavy search Consider Weaviate/Pinecone

Start simple. Scale when you feel the pain.


TL;DR: Vector Databases in 5 Bullets

  • Turn text → meaningful numbers (embeddings)
  • Store & search by similarity, not keywords
  • Perfect for AI search, recommendations, chatbots
  • Different from SQL/NoSQL: math-based, not rule-based
  • Start with pgvector — you probably don’t need more

Want to try it today?

-- In PostgreSQL with pgvector
CREATE EXTENSION vector;
CREATE TABLE products (id serial, description text, embedding vector(1536));

-- Insert
INSERT INTO products (description, embedding)
VALUES ('cozy patio seating', '[0.3,0.8,...]');

-- Search
SELECT * FROM products
ORDER BY embedding <=> '[0.2,0.7,...]'  -- your query vector
LIMIT 5;

That’s it. You’re now doing AI-powered search.


MongoDB Also Does Vector Search! (Atlas Vector Search)

Yes! MongoDB added native vector search in MongoDB Atlas (cloud version).

Why Use MongoDB for Vectors?

  • You already use MongoDB? → No new database
  • Store documents + vectors together
  • Full-text + vector search in one query
  • Great for apps with rich metadata

MongoDB Vector Search – Basic Implementation (Python using pymongo)

# MongoDB Vector Search – 

from pymongo import MongoClient

# 1. Connect to MongoDB Atlas (replace <connection_string>)
client = MongoClient("mongodb+srv://<user>:<password>@cluster0.mongodb.net/")
db = client["store"]
collection = db["products"]

# 2. Insert a document with embedding
collection.insert_one({
    "description": "cozy patio seating",
    "price": 299,
    "category": "outdoor",
    "embedding": [0.3, 0.8, 0.1, ..., 0.9]  # 1536-dim vector (example)
})

# 3. Search using vector similarity
pipeline = [
    {
        "$vectorSearch": {
            "index": "vector_index",  # Name of your vector search index
            "path": "embedding",
            "queryVector": [0.2, 0.7, 0.1, ..., 0.4],  # Embedding of user query
            "numCandidates": 100,
            "limit": 5
        }
    },
    {
        "$project": {
            "description": 1,
            "price": 1,
            "category": 1,
            "score": { "$meta": "vectorSearchScore" }
        }
    }
]

results = list(collection.aggregate(pipeline))

# Print results
for doc in results:
    print(f"{doc['description']} | Score: {doc['score']:.4f}")

Note:

  • You must create a Vector Search Index in MongoDB Atlas first (via UI or CLI)
  • Use pip install pymongo to install the driver
  • Replace the placeholder vector with real 1536-dim embeddings (e.g., from OpenAI)

pgvector vs MongoDB Atlas Vector Search: Quick Comparison

Feature PostgreSQL + pgvector MongoDB Atlas Vector Search
Open source Yes No (cloud only)
Self-hostable Yes No
Works with existing data Yes Yes
Full-text + vector in one query Yes (with tsvector) Yes
Free tier Yes Yes (limited)
Best for Small–medium apps, full control Apps already on MongoDB, rapid prototyping

Rule of thumb:
Use pgvector if you want free, open, self-hosted
Use MongoDB Atlas if you’re already in the MongoDB ecosystem


Cheers,

Sim