cd /news/ai-infrastructure/chromadb-vs-qdrant-vs-weaviate-vs-pg… · home topics ai-infrastructure article
[ARTICLE · art-16304] src=dev.to pub= topic=ai-infrastructure verified=true sentiment=· neutral

ChromaDB vs Qdrant vs Weaviate vs pgvector: vector database shootout 2026

A developer benchmarked ChromaDB, Qdrant, Weaviate, and pgvector for RAG pipelines, finding that most teams over-optimize for future scale while underestimating day-one operational costs. ChromaDB offers the fastest setup with no server or schema required but degrades beyond 2–5 million vectors due to post-search metadata filtering, while Qdrant applies filters before ANN search and supports native hybrid search, making it suitable for production RAG pipelines and multi-tenant datasets above 5 million vectors.

read5 min publishedMay 28, 2026

Every RAG pipeline I've reviewed this year hits the same decision point: which vector store do you actually ship? The wrong choice compounds — it shapes your architecture, your operational overhead, and how painful a future migration will be. I've run all four of these in production or near-production contexts. Here's what actually matters for the decision.

Before benchmarking anything, answer these:

Most teams over-optimize for a scale they won't reach for 18 months and under-weight the day-one operational cost of a new infrastructure component.

ChromaDB requires no server, no Docker, no schema definition upfront. It's embedded in Python, and you can have a working vector store in a few lines:

import chromadb
from chromadb.utils import embedding_functions

client = chromadb.PersistentClient(path="./chroma_db")
ef = embedding_functions.SentenceTransformerEmbeddingFunction(
    model_name="all-MiniLM-L6-v2"
)

collection = client.get_or_create_collection(
    name="documents",
    embedding_function=ef,
    metadata={"hnsw:space": "cosine"}
)

collection.add(
    documents=[
        "FastAPI is great for building REST APIs",
        "Go outperforms Python on CPU-bound tasks",
        "Vector databases enable semantic search at scale",
    ],
    ids=["doc1", "doc2", "doc3"],
    metadatas=[
        {"source": "blog", "year": 2026},
        {"source": "blog", "year": 2025},
        {"source": "docs", "year": 2026},
    ]
)

results = collection.query(
    query_texts=["which backend language is fastest?"],
    n_results=2,
    where={"year": {"$gte": 2025}}
)
print(results["documents"])

The critical limitation: ChromaDB applies metadata filters after the ANN search. It over-fetches internally to compensate, which degrades recall correctness at scale. Its distributed mode remains underdeveloped as of mid-2026. Scale ceiling is roughly 2–5M vectors before you start noticing.

Best for: local dev, internal tools, demos, early-stage products.

Qdrant is written in Rust and applies payload filters before the ANN search — the technically correct behavior. This matters when you have multi-tenant data or narrow filter conditions. A filter applied post-search means you're doing extra work and getting non-deterministic recall when the filtered result set is smaller than your requested top_k

.

from qdrant_client import QdrantClient
from qdrant_client.models import (
    Distance, VectorParams, PointStruct,
    Filter, FieldCondition, MatchValue
)

client = QdrantClient(url="http://localhost:6333")

client.recreate_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=384, distance=Distance.COSINE),
)

points = [
    PointStruct(
        id=1,
        vector=[0.05] * 384,  # replace with real embeddings
        payload={"source": "blog", "year": 2026, "text": "FastAPI for REST APIs"}
    ),
    PointStruct(
        id=2,
        vector=[0.12] * 384,
        payload={"source": "docs", "year": 2025, "text": "HNSW index internals"}
    ),
]
client.upsert(collection_name="documents", points=points)

results = client.search(
    collection_name="documents",
    query_vector=[0.08] * 384,
    query_filter=Filter(
        must=[FieldCondition(key="year", match=MatchValue(value=2026))]
    ),
    limit=5
)
for r in results:
    print(r.payload["text"], round(r.score, 4))

Qdrant also supports sparse + dense hybrid search natively, which is useful when you want BM25 recall blended with semantic similarity — a common pattern for RAG over heterogeneous corpora. It handles concurrent writes well, exposes both REST and gRPC, and its Python SDK is actively maintained. The managed cloud tier is straightforward to size.

Best for: production RAG pipelines, multi-tenant SaaS, datasets above 5M vectors.

Weaviate offers the largest feature set in this list: GraphQL querying, multi-tenancy, built-in hybrid search, modules for text and images, and a schema-based data model. If you genuinely need multi-modal search or a GraphQL interface over your vector data, it's the only option here that delivers it cleanly.

The operational cost is real. Weaviate ships frequent releases and requires careful memory tuning on self-hosted deployments. Its schema-first approach adds friction during the exploration phase when your embedding model is still changing. The managed tier (Weaviate Cloud) is generous at small scale but cost climbs fast past 1M objects.

It's also the most complex to reason about internally: its ANN implementation is HNSW, and it layers BM25 on top for hybrid search. When things behave unexpectedly, the debugging surface is wide.

Best for: product search with image embeddings, teams that need GraphQL, complex multi-modal use cases.

If your application already runs on Postgres, pgvector eliminates an entire infrastructure dependency. Version 0.5 added HNSW index support, which closed most of the performance gap with dedicated solutions at moderate scale.

import psycopg2
import numpy as np

conn = psycopg2.connect("dbname=mydb user=postgres host=localhost")
cur = conn.cursor()

cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        content TEXT,
        source TEXT,
        year INT,
        embedding vector(384)
    )
""")
cur.execute(
    "CREATE INDEX IF NOT EXISTS idx_doc_embedding "
    "ON documents USING hnsw (embedding vector_cosine_ops)"
)
conn.commit()

embedding = np.random.rand(384).tolist()
cur.execute(
    "INSERT INTO documents (content, source, year, embedding) VALUES (%s, %s, %s, %s)",
    ("pgvector HNSW makes semantic search viable in Postgres", "blog", 2026, embedding)
)
conn.commit()

query_vec = np.random.rand(384).tolist()
cur.execute("""
    SELECT content, 1 - (embedding <=> %s::vector) AS similarity
    FROM documents
    WHERE year >= 2025
    ORDER BY embedding <=> %s::vector
    LIMIT 5
""", (query_vec, query_vec))

for row in cur.fetchall():
    print(f"{row[0]} -- similarity: {round(row[1], 4)}")

cur.close()
conn.close()

Your existing Postgres tooling — backups, monitoring, migrations, access control — carries over. No new service to operate, no new runbook to write. The tradeoffs: no native hybrid search yet (you can approximate with tsvector

  • cosine distance, but it's glue code), HNSW index builds are slower than Qdrant's, and at 10M+ vectors with high QPS, dedicated hardware starts to matter.

Best for: teams already on Postgres, datasets under 5M vectors, early-to-mid-stage RAG where operational simplicity matters.

ChromaDB Qdrant Weaviate pgvector
Setup Embedded Docker / Cloud Docker / Cloud PG Extension
Pre-filter ANN No Yes Yes Partial
Hybrid search No Yes Yes No
Scale ceiling ~5M 100M+ 50M+ ~10M
Operational cost Very low Low High Low (on PG)
Managed option No Yes Yes Via PG providers

Default path: start with ChromaDB to ship fast, migrate to Qdrant when you need pre-filter correctness or hit scale, use pgvector if you're already on Postgres and your dataset stays under a few million vectors. Reach for Weaviate only when you specifically need its feature set.

The biggest mistake I see is teams optimizing for a scale they won't reach for 18 months while ignoring the operational burden of a new database they'll feel on day one. Pick the simplest option that fits your actual current requirements, and design a migration path before you need it.

For regulated deployments — healthcare, finance, government — verify encryption-at-rest guarantees and data residency options for each managed offering before committing. We track the right questions to ask in our security evaluation checklists.

I run AYI NEDJIMI Consultants, a cybersecurity consulting firm. We publish free security hardening checklists — PDF and Excel.

── more in #ai-infrastructure 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/chromadb-vs-qdrant-v…] indexed:0 read:5min 2026-05-28 ·