# Announcing Lakebase Search: agent-native retrieval built into Lakebase Postgres

> Source: <https://www.databricks.com/blog/announcing-lakebase-search-agent-native-retrieval-built-lakebase-postgres>
> Published: 2026-06-16 12:45:00+00:00

Agents need search, and search needs a lakebase

by [Pranav Aurora](/blog/author/pranav-aurora), [Zhou Sun](/blog/author/zhou-sun) and [Jinjing Zhou](/blog/author/jinjing-zhou)

Today, we're introducing Lakebase Search: hybrid vector and full-text retrieval built into Lakebase, available now in beta on AWS and Azure. Powered by two native Postgres extensions, lakebase_vector and lakebase_text, it allows your entire agent loop to rely on a single data backend, a lakebase.

This brings next-level scale, next-level economics, and agent-first ergonomics.

Agents transform search into an operational workflow: they retrieve context, reason, act, and remember. This deeply connects the read path (retrieval) with the write path (memory), making instant retrieval essential to access freshly generated insights in real time.

Until now, that loop had no Postgres-native home built for the scale and economics that search at scale demands.

Agents now operate [4x more databases on Lakebase](https://www.databricks.com/blog/how-agentic-software-development-will-change-databases) than human users do, and their primary requirement is entirely different from a human's. Traditional search engines assume a read-only snapshot of stale data. Agents, however, treat search like a live operational database.

Look at a typical agent schema: chunked documents and embeddings live directly alongside an active conversational memory log. This creates a continuous read/write loop. Agents write new learnings to memory on one turn, and need that exact data fully indexed and searchable on the next. They don't just need fast retrieval; they need instant search on the absolute latest writes.

Search is a unique workload with two defining properties.

First, you store vastly more data than you actually query, leaving the majority of it cold.

Second, vector search causes severe data bloat. A 1 KB text file expands when vectorized. This is because the document is split into multiple chunks, with each chunk generating a distinct high-dimensional embedding—even before accounting for index overhead.

When multiplied across thousands of mostly idle tenants, traditional search architectures break down. Industry-standard vector indexes like HNSW are fundamentally memory-bound. Because fast graph traversal relies heavily on the index remaining resident in RAM, hosting cold multi-tenant data is expensive.

Last year, we introduced Lakebase: a serverless Postgres OLTP architecture where data lives in cheap cloud object storage, but a tiered cache (RAM, local NVMe, pageserver) ensures hot pages read at local-disk latency.

We realized this is the exact architecture modern search needs. But there was a catch: to actually unlock these economics without destroying query speed, you need an index layout explicitly designed to live in a tiered storage hierarchy. Lakebase didn't have one. So, we built it.

By pairing a tiered architecture with a purpose-built tiered index, we achieve:

The economics are easiest to see as a table. Per terabyte per month, at cloud list prices:

|
|
RAM | ~$3,000 / TB / month |
Local NVMe (cache) | ~$100 / TB / month |
Object storage | ~$20 / TB / month |

Our indexing method lets Lakebase keep only the active working set in RAM. The cold majority rests in object storage, making the system two orders of magnitude cheaper—while delivering the high-performance search your application actually requires.

When building Lakebase Search, we centered on two non-negotiable properties.

When building Lakebase Search, we had two strict requirements: it had to be 100% Postgres-native (reusing standard pgvector/tsvector types and ecosystem tools), and the indexing had to be built from the ground up for tiered cloud object storage.

To achieve this, we are launching two new Postgres extensions in Beta today. Both share the same goal: deliver state-of-the-art search relevance without forcing you to over-provision RAM.

We retained standard pgvector data types and operators but changed the underlying index type. Because the data remains in native pgvector format, it maintains compatibility and can be exported to other systems. By clustering and compressing vectors using [RaBitQ](https://arxiv.org/abs/2405.12497) (Randomized Binary Quantization), we shrink the index footprint 32x while maintaining high recall. A 100-million-vector index that previously required 300GB of RAM fits into under 10GB. This reduced memory footprint allows a single index to scale to over 1 billion vectors. The active working set is cached on local NVMe, while the cold tail resides in object storage.

Postgres handles exact keyword matching via GIN indexes, which must remain resident in RAM to maintain performance. This architecture causes memory costs to scale linearly with dataset size.

lakebase_text replaces GIN with an index optimized for sequential reads from cloud object storage. It introduces native BM25 relevance ranking to Postgres without the associated RAM footprint.

Because both extensions execute within the same engine, hybrid search runs in a single SQL query. Vector similarity and keyword relevance are combined via reciprocal rank fusion (RRF), allowing results to be joined and filtered against operational tables.

We benchmarked Lakebase Search on LAION-100M—100 million 768-dimensional vectors, top-10 retrieval, on a single instance. Query performance with a warm cache and a single connection delivers exact nearest neighbor recall with zero bloat:

|
|
|
|---|---|---|
0.955 | 30 ms | 51 |
0.942 | 18 ms | 104 |
0.926 | 14 ms | 142 |

Achieving this scale traditionally requires a memory-bound architecture. A standard pgvector HNSW index requires the neighbor graph and its target heap pages to remain resident in RAM for performant traversal. At 100 million vectors:

This architecture changes how to approach total cost of ownership. Legacy search requires a fixed baseline cost regardless of query volume, while Lakebase tracks actual usage:

|
|
|
Large Knowledge Bases (Mostly idle) | Fixed baseline costs to keep idle datasets resident in RAM. | Scales compute to zero. You pay only for object storage. |
Agent Memory & Chat (Bursty) | Over-provisioned RAM and compute to handle traffic spikes. | Dynamically scales compute for spikes, then scales down to zero. |
Search Bars (Sustained) | Massive instances sized to fit the entire dataset in RAM. | Smaller, cheaper instances because the dataset bypasses RAM residency |

**A single backend for memory and context: **

Agents shouldn't have to stitch together a vector database for context and a transactional database for memory. By pushing your retrieval logic directly into the database, your entire agent loop runs on one backend. Because Lakebase Search is Postgres—fully reusing standard pgvector and tsvector types—it plugs natively into your existing MCPs, standard drivers, and connectors. More importantly, because search lives right next to your operational data, you can execute a hybrid search, join against your application's tables, and safely filter by tenant, all in a single SQL query.

**Continuous search experimentation**

Optimizing chunking strategies or hybrid weights requires trial and error. Instead of exporting data to external batch systems for reprocessing, Lakebase Search connects with the Lakehouse to create a tight feedback loop. You can branch multi-terabyte datasets instantly at zero cost, build indexes out-of-band using parallel compute, and route agent feedback back to the Lakehouse for offline evaluation.

**A dedicated retrieval engine per agent**

Traditional architectures require sharing a single search cluster across all agents. Because idle indexes in Lakebase incur near-zero storage costs, you can provision thousands of isolated corpora dedicated to specific agents, users, or sessions. This shifts search from a stale snapshot into an operational read/write loop; data an agent writes on one turn is committed and retrievable on the next with full transactional guarantees.

Lakebase eliminates the need to wire together separate vector stores, search clusters, and transactional databases. By consolidating the entire lifecycle inside a single Postgres system, it delivers the scale and low cost of tiered cloud object storage alongside the real-time read/write ergonomics required for agentic workflows.

Lakebase Search is available today in Beta on AWS and Azure. What will your agents build?

Subscribe to our blog and get the latest posts delivered to your inbox.