{"slug": "lakebase-search-vector-and-bm25-on-neon", "title": "Lakebase Search: vector and BM25 on Neon", "summary": "Databricks launched Lakebase Search on Neon, a hybrid vector and BM25 full-text retrieval system built as two Postgres extensions optimized for Neon's storage-compute separation architecture, enabling scalable ANN and keyword search for over 1 billion vectors without leaving the database.", "body_md": "Now available to everyone: **Lakebase Search on Neon**. Hybrid vector + full-text retrieval via two Postgres extensions, `lakebase_vector`\n\nand `lakebase_text`\n\n.\n\nIf you're wondering about the naming: Lakebase refers to the Postgres database architecture where storage is separated from compute. The extensions are called `lakebase_vector`\n\nand `lakebase_text`\n\nbecause each runs industry-standard vector (ANN) and text (BM25) search algos in a way that is optimized for the lakebase storage-compute separation. The [Databricks announcement](https://www.databricks.com/blog/announcing-lakebase-search-agent-native-retrieval-built-lakebase-postgres) has more lakebase context, including benchmarks.\n\nThis post is the story of why and how we built it.\n\n## Limits of default search in Postgres\n\nWhen you're starting out, keeping search in Postgres is quick and easy compared to the alternatives. The architecture often looks something like this:\n\n`pgvector`\n\nwith HNSW for vectors- A generated\n`tsvector`\n\ncolumn with a GIN index for keyword search - One database, one connection string, one transaction\n\nThis is the pattern many of our own examples follow and recommend. It works well, but as you scale, you inevitably hit one of three issues:\n\n**HNSW eats your RAM.** Once you're past 5–10M vectors, you start sizing your Postgres instance around the vector index instead of your actual workload. Past 100M vectors, the working set stops fitting; query latency climbs, and your index takes hours to build.`pgvector`\n\n's`vector`\n\ntype caps HNSW at 2,000 dimensions. Current-generation embedding models like`text-embedding-3-large`\n\n(3072d) force you to cast to`halfvec`\n\n(giving up fp32 precision), use binary quantization, truncate dimensions via Matryoshka, or skip HNSW entirely.**GIN isn't really BM25.**`ts_rank`\n\ndoesn't use corpus-wide IDF, so your relevance scores quietly drift as the table grows. GIN also has no top-K pushdown, so every matching document gets scored before your`LIMIT`\n\nkicks in. The bigger the corpus, the slower the query, and the weaker the ranking.**You own the hybrid query.** Score normalization, tie-breaking, and per-tenant filtering: it's all SQL you're writing and maintaining yourself.\n\nIf you've outgrown that setup, your options haven't been great. You have to choose between running a new Postgres extension or doing search in a separate vector database. A separate database means another system to sync, state that drifts between them, and the [200ms-range search latencies](https://neon.com/blog/vecstore-replacing-pinecone-and-rds-with-neon) Vecstore wrote about before they consolidated onto Neon.\n\nLakebase Search takes the extension path, and optimizes each for Neon's unique architecture.\n\n## What is Lakebase Search\n\nLakebase Search is two Postgres extensions, `lakebase_vector`\n\nand `lakebase_text`\n\n, that bring scalable ANN and BM25 full-text search to Neon, designed for backends that need both semantic and keyword search in a single database.\n\n: adds the`lakebase_vector`\n\n`lakebase_ann`\n\nindex type for vector similarity search.**100% compatibility with**. Built on the same vector types, distance operators, and query syntax, so existing queries work unchanged. Scales to over 1 billion vectors on a single index.`pgvector`\n\n: adds the`lakebase_text`\n\n`lakebase_bm25`\n\nindex type for BM25 keyword search. No migration from PostgreSQL FTS required. Standard`tsvector`\n\ntypes and query operators work unchanged. Adds BM25 ranking and top-K pushdown that native GIN lacks.\n\n## Under the hood\n\nNeon separates compute from storage, with a hierarchy of caches in between. Storage on Neon is tiered from hot to cold: RAM, local NVMe, pageserver, object storage. Hot pages return at local-disk latency. Object storage only enters the read path when a read misses every tier above it.\n\nBoth `lakebase_ann`\n\nand `lakebase_bm25`\n\nare shaped for this hierarchy: footprints small enough to mostly live in the upper tiers, and layouts that turn deeper reads into big sequential block fetches.\n\n`lakebase_vector`\n\nuses IVF (inverted file) partitioning plus [RaBitQ](https://arxiv.org/abs/2405.12497) quantization.\n\nRaBitQ compresses vectors about 32x, so a 100M-vector index that would traditionally need ~300GB of RAM as HNSW fits in under 10GB. We keep `pgvector`\n\n's vector types and distance operators (`<->`\n\n, `<#>`\n\n, `<=>`\n\n); only the index type changes. Builds run up to 50–100x faster than HNSW on the same data.\n\n`lakebase_text`\n\nreplaces the GIN-on-`tsvector`\n\npattern with a BM25-native index. The index stores corpus-wide statistics (document frequencies, average document length) at build time. `<@>`\n\nreturns real BM25 scores. The scores are negative, so `ORDER BY score`\n\nascending puts the most relevant results first. The query path uses Block-Max WAND for top-K pushdown: the index returns the K most relevant documents without scoring every match. GIN can't do that. The layout is sequential, so each read pulls a contiguous block of pages off storage. Standard `tsvector`\n\nand `tsquery`\n\nstill work; the new pieces are the `<@>`\n\noperator and the `to_bm25query()`\n\nhelper.\n\nBecause both indexes live inside the same Postgres, hybrid search is just SQL. A single query can join against your operational tables, filtered by tenant or any other column, and run in one transaction. No sync, no second connection string, no state drift between two systems.\n\n## Lakebase Search is the solution to a problem we've been trying to solve for a long time\n\nThis isn't our first attempt at native vector search on Neon. Back in 2023 we shipped [pg_embedding](https://neon.com/blog/pg-embedding-extension-for-vector-search), an HNSW extension. The truth is it didn't work as we hoped, because HNSW is a graph index designed for a traditional server: local NVMe, and a long‑lived process that keeps the entire graph pinned in RAM between queries. Every search walks the graph through a bunch of small, random reads, and assumes those reads land in hot memory rather than going back to storage.\n\nOn Neon, the \"disk\" is object storage and the compute can scale to zero. Random reads against S3 take tens of milliseconds, and a cold start means rehydrating the whole graph before the first query can return. Swapping which extension ships in the box was never going to fix that, and it's why first-class search on Neon has been unfinished business for us for a while.\n\nLakebase Search indexes are durable on object storage, the same layer the rest of your data already lives on. That's why the extensions are called `lakebase_vector`\n\nand `lakebase_text`\n\n: indexes living on a data lake. Putting them behind the same separation of compute and storage that made data lakes work for analytics is what makes scale-to-zero possible in the first place. When the compute shuts down, the index keeps existing where it sits, ready for the next compute to attach to. That single design choice is what makes the rest of this work.\n\nScale-to-zero stops requiring a warmup, because the index is durable on its own and the compute on top is just a cache that can be rebuilt on demand from the bytes already sitting in object storage. The cache itself is empty after a cold start, so the first few queries pay object-storage latency while it fills. For latency-sensitive workloads, `lakebase_ann_prewarm()`\n\nloads the index into memory before the first query.\n\nBranching lets you tune search without touching prod. Because indexes live on object storage and branches are copy-on-write, you can branch a production database in seconds and inherit the same `lakebase_ann`\n\nand `lakebase_bm25`\n\nindexes. On that branch, you can try different fusion strategies (RRF with varying k, or weighted blends of vector and BM25 scores), run your eval suite against real production data, and compare recall and latency side by side. If the new configuration wins, apply it to prod with confidence. If it loses, delete the branch. Prod kept serving the whole time.\n\n## Hybrid retrieval, on the same foundation as your Neon data\n\nWith Lakebase Search, you stop needing to wire together a vector store, a search cluster, and a transactional database for the same application. The whole retrieval lifecycle lives inside one Postgres: the scale and economics of tiered cloud object storage underneath, the real-time read/write ergonomics on top that agentic workflows depend on. Indexes on a data lake, queried like a database.\n\nTo get started, the [Lakebase Search quickstart](https://neon.com/docs/ai/lakebase-search-get-started) walks you through enabling the feature on a Neon project and shipping your first hybrid query. For the full reference, including index parameters and tuning knobs, see the [ lakebase_vector docs](https://neon.com/docs/extensions/lakebase-vector).\n\nStay tuned for a follow-up post with the full benchmarks of `lakebase_vector`\n\nagainst `pgvector`\n\non Neon.", "url": "https://wpnews.pro/news/lakebase-search-vector-and-bm25-on-neon", "canonical_source": "https://neon.com/blog/lakebase-search-on-neon", "published_at": "2026-07-02 08:49:00+00:00", "updated_at": "2026-07-03 20:51:57.549462+00:00", "lang": "en", "topics": ["ai-infrastructure"], "entities": ["Databricks", "Neon", "Lakebase Search", "pgvector", "BM25", "HNSW", "GIN", "Vecstore"], "alternates": {"html": "https://wpnews.pro/news/lakebase-search-vector-and-bm25-on-neon", "markdown": "https://wpnews.pro/news/lakebase-search-vector-and-bm25-on-neon.md", "text": "https://wpnews.pro/news/lakebase-search-vector-and-bm25-on-neon.txt", "jsonld": "https://wpnews.pro/news/lakebase-search-vector-and-bm25-on-neon.jsonld"}}