Vector Databases Explained: The Foundation of AI Search
Vector databases power modern AI search. Here’s what you need to know.
What Is a Vector Database?
A vector database stores and searches embeddings—numerical representations of data (text, images, etc.) that capture meaning.
Traditional Database:
"The cat sat on the mat" → Stored as text string
Vector Database:
"The cat sat on the mat" → [0.23, -0.41, 0.87, ...] (1536 numbers)
Why Vectors Matter
Semantic Understanding
Traditional search: Find exact matches Vector search: Find similar meaning
Query: “feline resting on carpet”
- Keyword search: No match
- Vector search: Finds “The cat sat on the mat” ✓
How It Works
Step 1: Generate Embeddings
Use an embedding model to convert data:
Text → Embedding Model → Vector
"Hello world" → [0.12, -0.34, 0.56, ...]
Step 2: Store Vectors
Vector database indexes vectors for fast search:
ID: 1, Vector: [0.12, -0.34, ...], Metadata: {source: "doc1"}
ID: 2, Vector: [0.45, -0.12, ...], Metadata: {source: "doc2"}
Step 3: Search by Similarity
Query vector compared to stored vectors:
Query: "greeting" → [0.11, -0.32, ...]
Most similar: ID 1 (cosine similarity: 0.95)
Key Concepts
Embeddings
Dense numerical representations capturing semantic meaning:
- Text embeddings (OpenAI, Cohere)
- Image embeddings (CLIP)
- Audio embeddings (Whisper)
Similarity Metrics
How to measure “closeness”:
| Metric | Use Case |
|---|---|
| Cosine | Text similarity |
| Euclidean | General purpose |
| Dot product | Normalized vectors |
Indexing
Structures for fast search:
- HNSW (common, balanced)
- IVF (good for large scale)
- Flat (exact, small datasets)
Popular Vector Databases
Managed Services
| Database | Strengths |
|---|---|
| Pinecone | Easy to use, fully managed |
| Weaviate | Hybrid search, open source |
| Qdrant | Performance, open source |
| Milvus | Scale, open source |
Cloud Provider Options
| Service | Provider |
|---|---|
| Azure AI Search | Microsoft |
| Vertex AI Vector Search | |
| Amazon OpenSearch | AWS |
Self-Hosted
| Option | Best For |
|---|---|
| Chroma | Local development |
| pgvector | PostgreSQL users |
| Elasticsearch | Existing ES users |
Use Cases
1. RAG (Retrieval-Augmented Generation)
Store knowledge base, retrieve relevant context for LLMs:
User question → Vector search → Relevant docs → LLM → Answer
2. Semantic Search
Search by meaning, not keywords:
- Product search
- Documentation search
- Support ticket search
3. Recommendation
Find similar items:
- Similar products
- Related content
- Matching profiles
4. Deduplication
Find near-duplicates:
- Document deduplication
- Image matching
- Record linking
Building with Vector Databases
Basic Pattern
# Pseudocode
# 1. Create embedding
embedding = embed("User query text")
# 2. Search similar
results = vector_db.search(
embedding,
top_k=5,
filter={"category": "products"}
)
# 3. Use results
for result in results:
print(result.text, result.score)
RAG Pattern
# 1. Embed user query
query_embedding = embed(user_query)
# 2. Search relevant documents
docs = vector_db.search(query_embedding, top_k=3)
# 3. Build prompt with context
prompt = f"""
Based on these documents:
{docs}
Answer: {user_query}
"""
# 4. Generate answer
answer = llm.generate(prompt)
Choosing a Vector Database
Consider:
| Factor | Questions |
|---|---|
| Scale | How many vectors? |
| Latency | Speed requirements? |
| Features | Filtering, metadata? |
| Operations | Self-host or managed? |
| Cost | Budget constraints? |
Quick Guide
- Prototype: Chroma (local, free)
- Startup: Pinecone (easy, managed)
- Enterprise: Weaviate or Qdrant (flexible, scalable)
- Existing DB: pgvector (PostgreSQL extension)
Performance Tips
- Choose right embedding model - Size vs. quality tradeoff
- Optimize chunk size - Too small: noise; too big: irrelevant
- Use metadata filters - Narrow search space
- Monitor and tune - Index parameters matter
- Cache common queries - Reduce latency
Need help implementing vector search? Let’s discuss your architecture.