ChromaDB vs Qdrant vs Weaviate vs pgvector: vector database shootout 2026

A developer benchmarked ChromaDB, Qdrant, Weaviate, and pgvector for RAG pipelines, finding that most teams over-optimize for future scale while underestimating day-one operational costs. ChromaDB offers the fastest setup with no server or schema required but degrades beyond 2–5 million vectors due to post-search metadata filtering, while Qdrant applies filters before ANN search and supports native hybrid search, making it suitable for production RAG pipelines and multi-tenant datasets above 5 million vectors.

Every RAG pipeline I've reviewed this year hits the same decision point: which vector store do you actually ship? The wrong choice compounds — it shapes your architecture, your operational overhead, and how painful a future migration will be. I've run all four of these in production or near-production contexts. Here's what actually matters for the decision. Before benchmarking anything, answer these: Most teams over-optimize for a scale they won't reach for 18 months and under-weight the day-one operational cost of a new infrastructure component. ChromaDB requires no server, no Docker, no schema definition upfront. It's embedded in Python, and you can have a working vector store in a few lines: python import chromadb from chromadb.utils import embedding functions client = chromadb.PersistentClient path="./chroma db" ef = embedding functions.SentenceTransformerEmbeddingFunction model name="all-MiniLM-L6-v2" collection = client.get or create collection name="documents", embedding function=ef, metadata={"hnsw:space": "cosine"} collection.add documents= "FastAPI is great for building REST APIs", "Go outperforms Python on CPU-bound tasks", "Vector databases enable semantic search at scale", , ids= "doc1", "doc2", "doc3" , metadatas= {"source": "blog", "year": 2026}, {"source": "blog", "year": 2025}, {"source": "docs", "year": 2026}, results = collection.query query texts= "which backend language is fastest?" , n results=2, where={"year": {"$gte": 2025}} print results "documents" The critical limitation: ChromaDB applies metadata filters after the ANN search. It over-fetches internally to compensate, which degrades recall correctness at scale. Its distributed mode remains underdeveloped as of mid-2026. Scale ceiling is roughly 2–5M vectors before you start noticing. Best for : local dev, internal tools, demos, early-stage products. Qdrant is written in Rust and applies payload filters before the ANN search — the technically correct behavior. This matters when you have multi-tenant data or narrow filter conditions. A filter applied post-search means you're doing extra work and getting non-deterministic recall when the filtered result set is smaller than your requested top k . python from qdrant client import QdrantClient from qdrant client.models import Distance, VectorParams, PointStruct, Filter, FieldCondition, MatchValue client = QdrantClient url="http://localhost:6333" client.recreate collection collection name="documents", vectors config=VectorParams size=384, distance=Distance.COSINE , points = PointStruct id=1, vector= 0.05 384, replace with real embeddings payload={"source": "blog", "year": 2026, "text": "FastAPI for REST APIs"} , PointStruct id=2, vector= 0.12 384, payload={"source": "docs", "year": 2025, "text": "HNSW index internals"} , client.upsert collection name="documents", points=points results = client.search collection name="documents", query vector= 0.08 384, query filter=Filter must= FieldCondition key="year", match=MatchValue value=2026 , limit=5 for r in results: print r.payload "text" , round r.score, 4 Qdrant also supports sparse + dense hybrid search natively, which is useful when you want BM25 recall blended with semantic similarity — a common pattern for RAG over heterogeneous corpora. It handles concurrent writes well, exposes both REST and gRPC, and its Python SDK is actively maintained. The managed cloud tier is straightforward to size. Best for : production RAG pipelines, multi-tenant SaaS, datasets above 5M vectors. Weaviate offers the largest feature set in this list: GraphQL querying, multi-tenancy, built-in hybrid search, modules for text and images, and a schema-based data model. If you genuinely need multi-modal search or a GraphQL interface over your vector data, it's the only option here that delivers it cleanly. The operational cost is real. Weaviate ships frequent releases and requires careful memory tuning on self-hosted deployments. Its schema-first approach adds friction during the exploration phase when your embedding model is still changing. The managed tier Weaviate Cloud is generous at small scale but cost climbs fast past 1M objects. It's also the most complex to reason about internally: its ANN implementation is HNSW, and it layers BM25 on top for hybrid search. When things behave unexpectedly, the debugging surface is wide. Best for : product search with image embeddings, teams that need GraphQL, complex multi-modal use cases. If your application already runs on Postgres, pgvector eliminates an entire infrastructure dependency. Version 0.5 added HNSW index support, which closed most of the performance gap with dedicated solutions at moderate scale. python import psycopg2 import numpy as np conn = psycopg2.connect "dbname=mydb user=postgres host=localhost" cur = conn.cursor cur.execute "CREATE EXTENSION IF NOT EXISTS vector" cur.execute """ CREATE TABLE IF NOT EXISTS documents id SERIAL PRIMARY KEY, content TEXT, source TEXT, year INT, embedding vector 384 """ cur.execute "CREATE INDEX IF NOT EXISTS idx doc embedding " "ON documents USING hnsw embedding vector cosine ops " conn.commit embedding = np.random.rand 384 .tolist cur.execute "INSERT INTO documents content, source, year, embedding VALUES %s, %s, %s, %s ", "pgvector HNSW makes semantic search viable in Postgres", "blog", 2026, embedding conn.commit query vec = np.random.rand 384 .tolist cur.execute """ SELECT content, 1 - embedding <= %s::vector AS similarity FROM documents WHERE year = 2025 ORDER BY embedding <= %s::vector LIMIT 5 """, query vec, query vec for row in cur.fetchall : print f"{row 0 } -- similarity: {round row 1 , 4 }" cur.close conn.close Your existing Postgres tooling — backups, monitoring, migrations, access control — carries over. No new service to operate, no new runbook to write. The tradeoffs: no native hybrid search yet you can approximate with tsvector + cosine distance, but it's glue code , HNSW index builds are slower than Qdrant's, and at 10M+ vectors with high QPS, dedicated hardware starts to matter. Best for : teams already on Postgres, datasets under 5M vectors, early-to-mid-stage RAG where operational simplicity matters. | ChromaDB | Qdrant | Weaviate | pgvector | | |---|---|---|---|---| | Setup | Embedded | Docker / Cloud | Docker / Cloud | PG Extension | | Pre-filter ANN | No | Yes | Yes | Partial | | Hybrid search | No | Yes | Yes | No | | Scale ceiling | ~5M | 100M+ | 50M+ | ~10M | | Operational cost | Very low | Low | High | Low on PG | | Managed option | No | Yes | Yes | Via PG providers | Default path: start with ChromaDB to ship fast, migrate to Qdrant when you need pre-filter correctness or hit scale, use pgvector if you're already on Postgres and your dataset stays under a few million vectors. Reach for Weaviate only when you specifically need its feature set. The biggest mistake I see is teams optimizing for a scale they won't reach for 18 months while ignoring the operational burden of a new database they'll feel on day one. Pick the simplest option that fits your actual current requirements, and design a migration path before you need it. For regulated deployments — healthcare, finance, government — verify encryption-at-rest guarantees and data residency options for each managed offering before committing. We track the right questions to ask in our security evaluation checklists https://ayinedjimi-consultants.fr/checklists . I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. We publish free security hardening checklists — PDF and Excel.