Every enterprise AI conversation right now starts in the same place: "connect the model to our data." Then it stalls in the same place: which data, copied where, governed by whom.
I build retrieval for a living (I wrote the original open-source SWIRL), so let me make an argument that runs against the current default - and then show the architecture it implies.
The standard RAG recipe is: crawl your sources, chunk them, embed them, and load the vectors into a database. Now your model can retrieve. It also means you have a second copy of your content living in an index you have to secure, keep in sync, and explain to whoever owns compliance. You've recreated every permission boundary by hand, and you'll eventually get one wrong.
For a lot of teams that copy is simply not allowed. Regulated content, client-confidential material, anything privileged - copying it into a vendor store is exposure you don't get paid to take on. Here's the part people don't want to hear. Meta's XetHub team benchmarked three retrieval strategies: keyword-only (BM25), vector-only, and hybrid (keyword to pull candidates, then re-rank). Keyword-only came last. Vector-only did better.
Hybrid won - and their conclusion was blunt: "No vector database necessary."
That matches what we see in production. Vector similarity is a great high-precision filter, not a great first pass. Lead with exact matches and quoted terms, then let embeddings and a cross-encoder re-rank what's left.
It's not a slogan; it's a pipeline. In SWIRL, relevance is three passes, and both models run locally:
E5-large-v2
, using title-aware chunking and hybrid keyword+vector fusion (RRF). No vector database to build or secure.MS-MARCO
cross-encoder reads the query and document Feed that to your LLM - whatever model you've chosen, including an on-prem one - and the answer gets better, because the context got better. Same model, sharper input.
The stack is settling: foundation models orchestrate, MCP is the retrieval interface, the chat UI is a commodity. The piece none of them provide from outside your walls is knowledge authority - which document is official, which clause your org actually uses, which answer carries approval.
So we made it a first-class layer. SWIRL 5 exposes an MCP server. Any agent - Claude, Copilot, ChatGPT, your own - calls SWIRL and gets ranked, permissioned, organization-approved answers. A team pins the canonical result for a query once; every agent gets it after that. And no copy of your data leaves your tenant.
Three properties fall out of it, and they're the whole reason to build it this way:
If you're wiring agents into enterprise data and the "just copy everything into a vector store" step is making your security team twitch, there's another shape available. SWIRL 5 goes GA July 15; the preview is open if you want to point it at your own stack. Either way - I'd genuinely like to hear how you're handling the authority problem, because I don't think the industry has it figured out yet. Sid Probstein is the creator of SWIRL and CEO of SWIRL AI.