Vector Databases Compared: pgvector, Qdrant, Pinecone, Weaviate

wpnews.pro

There's a moment in almost every RAG project where someone asks the question that decides your next two years of ops work: "Do we actually need a vector database, or can Postgres just do this?"

It's a better question than it sounds, because the honest answer isn't "use Pinecone" or "use Postgres." It's "it depends on numbers you probably haven't measured yet": how many vectors, how aggressively you filter, how much you care about the absolute ceiling of queries per second. Most teams pick based on a blog post's leaderboard and then spend a quarter discovering that the leaderboard measured a workload nothing like theirs.

So let's not do that. Let's look at what these four (pgvector, Qdrant, Pinecone, and Weaviate) are actually doing under the hood when you ask them to find the closest vectors, why their filtering stories are wildly different, and where each one falls off a cliff. By the end you'll be able to answer the Postgres question for your workload, not a benchmark's.

First, the thing that unites all four: none of them are really finding the nearest vectors. They're finding probably the nearest vectors, fast.

If you wanted the true nearest neighbors to a query vector, you'd compare it against every single vector in your collection and sort by distance. That's exact, and it's also linear: a million vectors means a million distance calculations per query. At a few thousand rows you won't notice. At ten million you'll be timing out.

So every production vector store uses approximate nearest neighbor (ANN) search instead. You give up a small slice of accuracy (you might miss one of the true top-10 results occasionally) in exchange for queries that scale logarithmically instead of linearly. That accuracy slice has a name, recall: the fraction of the true nearest neighbors your index actually returns. Recall of 0.99 means you're getting 99 of every 100 true results. Tuning a vector database is, almost entirely, the art of trading recall against speed and memory.

And the dominant way all four do this is the same algorithm: HNSW. Understand HNSW once and three-quarters of every vendor's docs suddenly make sense.

HNSW stands for Hierarchical Navigable Small World, which is a lot of words for a fairly elegant idea: build a graph you can navigate the way you'd find a house in an unfamiliar city: fly to the right country, drive to the right neighborhood, then walk the last block.

It borrows from two older ideas. The first is the skip list: a linked list with express lanes stacked on top, where each higher layer contains fewer elements, so you can skip across big distances up high and then drop down for precision. The second is a small-world graph, where every node has a handful of links and any two nodes are only a few hops apart.

HNSW stacks these into layers. Every vector lives in layer 0, the dense bottom layer. As you go up, each layer holds exponentially fewer vectors. A node's top layer is chosen randomly with a probability that decays logarithmically, so most vectors only exist at the bottom and a lucky few reach the top. The vectors up high have long-range links; the ones at the bottom have short, local ones.

A search starts at a single entry point in the top layer and greedily walks toward the query vector, always hopping to the neighbor that's closest to the target. When it can't get any closer at that layer, it drops down a level and keeps going. Top layers cover huge distances in a few hops; the bottom layer does the fine-grained final approach. That's the "fly, drive, walk" pattern, and it's why search time grows roughly with the logarithm of your collection size instead of linearly.

Three parameters control the whole tradeoff, and they're named almost identically across every engine, so learn them once:

hnsw_ef

or just ef

): how many candidates the search explores at query time. This is your live recall-vs-latency dial. Crank it up for accuracy, drop it for speed. It's the one knob you'll actually tune in production.Here's the part that matters for choosing a database: HNSW is greedy and memory-hungry. The whole graph wants to live in RAM, and its memory cost scales with both your vector count and M. Every one of these four engines is, underneath, managing the same HNSW tradeoffs. They just expose them differently and bolt very different things around them.

pgvector is the odd one out, and that's its entire selling point. It's not a database. It's a Postgres extension. You CREATE EXTENSION vector

, you get a vector

column type and a couple of index types, and suddenly the database you already run, back up, and monitor can do similarity search.

The appeal is real and it's mostly about ops surface. Your embeddings sit in the same table as the rows they describe. You can JOIN

them against your actual data. You filter with plain WHERE

clauses. You get transactions, foreign keys, and your existing backup story for free. For a huge number of apps, that "one less service to run" math wins before you even look at a benchmark.

A vector column and an HNSW index look like this:

schema.sql

CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id        bigserial PRIMARY KEY,
  content   text,
  category  text,
  embedding vector(1536)          -- one embedding per row
);

-- HNSW index; m and ef_construction map straight to the algorithm above
CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

And a query is just SQL with a distance operator (<=>

is cosine distance, <->

is L2, <#>

is negative inner product):

search.sql

SELECT id, content
FROM documents
WHERE category = 'support'          -- ordinary filter, ordinary index
ORDER BY embedding <=> $1           -- nearest by cosine distance
LIMIT 10;

That WHERE category = 'support'

line is doing something genuinely nice: Postgres can use a normal B-tree index on category

alongside the vector index, because it's the same query planner that's optimized relational filtering for decades. Filtering, the thing that trips up purpose-built vector engines, is the thing Postgres has always been good at.

pgvector also supports IVFFlat, the other classic ANN index, and the choice between the two is worth understanding because it bites people.

Warning

IVFFlat clusters your vectors with k-means and then only searches the nearest clusters. That means it needsrepresentative data already in the tablewhen you build the index. Build an IVFFlat index on an empty or barely-populated table and you get meaningless cluster centroids and recall that quietly falls apart. HNSW has no such problem: it builds incrementally as rows arrive, so it works fine on a table you're still filling. IVFFlat builds faster and uses far less memory; HNSW gives better speed-versus-recall. For most people starting out, HNSW is the safer default.

Now the gotcha nobody mentions until you hit it. pgvector's indexable vector type tops out at 2,000 dimensions. That sounds like plenty until you reach for OpenAI's

text-embedding-3-large

, which produces 3,072-dimensional vectors. You can store those in a vector

column, but you can't build an HNSW or IVFFlat index on them: the index has the 2,000 ceiling, not the column. The fix arrived in pgvector 0.7.0 with halfvec

, a half-precision (16-bit) float type that raises the indexable limit to 4,000 dimensions and roughly halves storage at the same time. So the modern move for big embeddings is a halfvec

column with a halfvec_cosine_ops

index, but if you didn't know that, your first instinct (a plain vector(3072)

index) fails with an error, and you're left confused on day one.When does pgvector run out of road? The rough consensus from real-world reports is that it stays competitive up to somewhere in the low tens of millions of vectors, after which the memory pressure of keeping HNSW graphs in a general-purpose database (one that's also juggling your relational workload) starts to tell. That's not a hard wall; it's the point where a dedicated engine starts to earn its keep.

If pgvector's pitch is "you already have it," Qdrant's pitch is "we made filtering actually work." It's an open-source database written in Rust, built from the ground up for vector search, and in published ANN benchmarks it tends to post some of the highest queries-per-second numbers of the bunch. But the speed isn't the interesting part. The filtering is.

Here's the problem every vector engine wrestles with. Say you want "the 10 most similar documents where tenant_id = 42." You have two obvious strategies and both are bad:

tenant_id = 42

first, then do similarity search over just those. Clean in theory, but it sidesteps the HNSW index entirely, and on a large dataset, restricting the candidate set first breaks so many links in the graph that recall collapses. Great for small, low-cardinality filters; a disaster at scale.Qdrant's answer is a third option it calls filterable HNSW. The trick is to fold the filter conditions into the graph traversal itself. Qdrant builds inverted indexes (payload indexes) on your metadata, and during the HNSW walk it skips over nodes that don't match the filter instead of pre-narrowing the set or post-discarding results. Even better, it has a query planner that picks a strategy based on filter cardinality: when a filter matches very few points, HNSW would shatter, so the planner abandons the graph and just scans the payload index directly, which for a tiny match set is cheaper anyway.

A filtered search looks like this:

qdrant_search.py

from qdrant_client import QdrantClient
from qdrant_client.models import Filter, FieldCondition, MatchValue

client = QdrantClient(url="http://localhost:6333")

results = client.query_points(
    collection_name="documents",
    query=query_vector,
    query_filter=Filter(
        must=[FieldCondition(key="tenant_id", match=MatchValue(value=42))]
    ),
    limit=10,
)

That query_filter

isn't a post-processing step bolted onto the results. It's threaded through the search. If you're building anything multi-tenant, or anything where "similar and matching these attributes" is the real query (which, in practice, it almost always is), this is the feature that matters more than raw QPS. Filtering badly is how vector search quietly returns wrong answers, and Qdrant treats that as the core problem rather than an afterthought.

Pinecone took the opposite bet from Qdrant. Where Qdrant hands you a powerful engine to operate, Pinecone hands you an endpoint and a bill. It's fully managed and serverless: there's no node to size, no index memory to worry about, no rebuild to schedule. You send vectors, you query them, you pay per usage, and the scaling is somebody else's pager.

For a team that wants to ship RAG this sprint and never think about vector infrastructure again, that's a legitimately strong offer. The mental model is closer to "S3 for vectors" than "a database you run."

pinecone_search.py

from pinecone import Pinecone

pc = Pinecone(api_key="...")
index = pc.Index("documents")

res = index.query(
    vector=query_vector,
    top_k=10,
    filter={"category": {"$eq": "support"}},
    include_metadata=True,
)

The tradeoffs are the usual managed-service ones, sharpened. You're renting, so at scale the bill grows in a way that self-hosting doesn't, and you can't tune the engine internals the way you can with an open-source store you control. Latency is the other thing to actually measure rather than assume: a managed service has network hops and shared infrastructure that a Qdrant instance sitting next to your app doesn't, and some published comparisons have shown Pinecone's tail latencies running well behind a self-hosted engine on comparable tiers. None of that makes it the wrong choice. For plenty of teams, "we never have to think about it" is worth more than a few milliseconds and a bigger invoice. Just don't pick it for speed; pick it for the operational silence.

Weaviate is open-source with a managed cloud option, and its sharpest edge is hybrid search, combining semantic vector search with old-fashioned keyword (BM25) search in a single query.

This matters more than it sounds. Pure vector search is great at "find me things that mean roughly this," but it's surprisingly bad at exact terms: product SKUs, error codes, names, acronyms. Ask a vector index for "error TS2589" and it'll happily return things that are semantically near "TypeScript errors" while completely missing the document that literally contains TS2589

. Keyword search nails exact terms but has no idea that "car" and "automobile" are the same thing. Hybrid search runs both and fuses the results.

Weaviate does the fusion with an algorithm like Reciprocal Rank Fusion (RRF): run the vector search and the keyword search in parallel, then combine their ranked lists by rewarding documents that score well in either. An alpha

parameter from 0 to 1 lets you dial the balance: alpha = 1

is pure vector, alpha = 0

is pure keyword, and somewhere in between is usually where real retrieval quality lives.

weaviate_hybrid.py

import weaviate

client = weaviate.connect_to_local()
docs = client.collections.get("Documents")

res = docs.query.hybrid(
    query="error TS2589 in build pipeline",
    alpha=0.5,        # balance semantic vs keyword
    limit=10,
)

Weaviate has been investing heavily here: a 2025 rewrite of its hybrid engine moved from maintaining two separate indexes (HNSW for vectors, a separate BM25 keyword index) to a single unified index, cutting storage and speeding up the fused query. If your retrieval problem is genuinely "sometimes the user means a concept and sometimes they mean an exact string" (which describes most real search boxes), hybrid is the feature that lifts your results, and Weaviate has made it the center of the product rather than a checkbox.

Here's the honest decision tree, stripped of vendor marketing.

Start with pgvector if you already run Postgres and you're under roughly ten million vectors. This is most teams, and they don't realize it. The "we need a real vector database" instinct is usually premature. Keeping embeddings next to your relational data, filtering with plain SQL, and adding zero new services to your ops surface is worth a lot, and modern pgvector with HNSW and halfvec

is genuinely production-grade. The most common mistake in this space isn't picking the wrong vector database. It's reaching for one at all when a Postgres extension would have carried you for two more years.

Reach for Qdrant when filtering is the actual problem: multi-tenant data, heavy metadata constraints, "similar and matching these attributes" as your bread-and-butter query, or when you genuinely need the top end of filtered-search throughput and you're happy to self-host. Its filterable HNSW is the best answer to the filtering trap that quietly wrecks recall everywhere else.

Reach for Pinecone when you want vector infrastructure to disappear. No nodes, no capacity planning, no rebuilds, at the cost of a usage bill and less control. Pick it for operational silence, not for raw latency.

Reach for Weaviate when exact terms and semantic meaning both matter in the same query. If your users sometimes type a concept and sometimes type a SKU or an error code, hybrid search is the difference between "close enough" and "correct," and Weaviate is built around it.

Underneath, they're all running the same HNSW graph, trading the same recall against the same speed and memory. The differences that should drive your choice aren't in the algorithm, they're in everything wrapped around it: how it filters, who operates it, what it costs, and whether it can search keywords as well as vectors. Measure your own vector count and your own filter patterns first. The benchmark that matters is yours.

Originally published at nazarboyko.com.

source & further reading

dev.to — original article I Thought I Knew Linux — Then I Actually Learned It (Week 1) Local Gradient Accumulation Speeds Training 1.7 How to Fix Cursor Composer 2.5 Freezing & Stuck on Thinking Issue in 2026

Vector Databases Compared: pgvector, Qdrant, Pinecone, Weaviate

Run your AI side-project on zahid.host