AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65 .

wpnews.pro

Here's what happens every time you ask an AI coding agent a question:

This is BM25 keyword search on raw source code. It's the same algorithm that powered web search in 2009. And it's still the shape of most coding-agent retrieval systems: keyword search, grep, file search, context stuffing.

I spent the last few months building something better. Here's what I found.

When you ask "how does Flask handle URL routing?", you're writing in English. The answer lives in scaffold.py

, app.py

, and wrappers.py

— files full of Python syntax, decorator patterns, and Werkzeug internals.

BM25 tries to match your words against those files. It mostly fails.

The word "routing" appears 4 times in Flask's source. "URL" appears 31 times — mostly in docstrings and variable names scattered across 70+ files. BM25 retrieves 15 of them and hopes for the best.

The agent doesn't just have a retrieval problem. It has a vocabulary problem.

Natural language queries describe behavior. Source code implements syntax. These are different vocabularies, and no amount of BM25 tuning bridges that gap.

Generate a human-readable wiki page for every file and module, then search the wiki.

A wiki page for flask/sansio/scaffold.py

reads like this:

Scaffold is the shared base class for Flask and Blueprint.@route()

callsadd_url_rule()

, which creates a Werkzeug Rule and inserts it intourl_map

. View callables are stored inview_functions

keyed by endpoint name.

Search that for "how does Flask handle URL routing?" — the query and the document speak the same language. No vocabulary gap.

That's Provenant. Index once, search a wiki forever.

I ran this against SWE-bench Verified — 500 real GitHub issues across 12 major Python repos. The metric is Coverage@5: does the correct file appear in the top 5 retrieved results?

Method	Coverage@5	Tokens/query	Delta
Raw BM25 (baseline)	~40%	~65,000	—
Provenant (wiki + BM25)
63.8%
~1,030
+24pp
Provenant + HyDE	66.2%	~1,030	+26pp

+24 percentage points. From 40% to 63.8%. On 500 tasks. Across 12 repos.

And the token numbers aren't rounding errors:

Repo	Naive tokens	Provenant tokens	Reduction
Flask (30 queries)	69,044	1,070	64.5×
Django (20 queries)	59,634	994	60.0×

Answer quality delta: −0.15 on a 5-point blind-judge scale. In this sample, that was not a meaningful drop. The model answers just as well with 1k tokens as it does with 69k — it just wasn't using the other 68k anyway.

Step 1: Index your repo once.

provenant init /path/to/your/repo

Provenant parses every file with tree-sitter, generates a wiki page per module via LLM, and stores everything in SQLite/FTS5 + LanceDB. 6,122 pages across 12 repos. Done in minutes.

Step 2: Start the MCP server.

provenant serve --repo /path/to/your/repo

That's it. Provenant is now a local MCP server exposing tools your agent can call natively.

Step 3: Just use Claude. No special commands.

Add it to your claude_desktop_config.json

:

{
  "mcpServers": {
    "provenant": {
      "command": "provenant",
      "args": ["serve", "--repo", "/path/to/your/repo"]
    }
  }
}

Now when you ask Claude "how does authentication work?" — it doesn't grep your codebase. It calls provenant_ask

, gets 3 wiki pages (~1k tokens), and answers. You never change how you work. The retrieval layer is just better.

You ask Claude a question
         ↓
Claude calls provenant_ask (MCP tool)
         ↓
Provenant: BM25 over wiki pages → top-k results
         ↓
Claude synthesizes answer from ~1,030 tokens
         ↓
Attribution confidence logged → weak pages auto-repaired

I asked a fresh repo — a Java Android music player it had never seen — "How does this app play music?" Here's the actual response after calling provenant_ask

:

Screenshot: Claude's unedited response after Provenant retrieved 3 wiki pages (~1k tokens). Discovery phase: ~30 seconds.

"Provenant compressed the discovery phase from ~5–10 minutes of grepping/reading to ~30 seconds. It's like having an experienced teammate say 'here's the 3 files you need and what they do' before you dive in."

— Claude, unprompted

That's on a Java codebase. Provenant indexes Python — but the wiki pages are plain English, and Claude reads English just fine.

Nobody measures when a retrieval index is wrong. BM25 returns 5 results and acts confident. The model uses 2. The other 3 were noise. The index degrades silently as your codebase changes.

I built a metric for this:

attribution confidence = pages actually cited / pages retrieved

Zero extra LLM calls. Derived from the citation structure already in the answer. It correlates with answer quality (r = 0.415 against a blind LLM judge) — high-confidence retrievals score 5.0/5 on average; low-confidence score 4.5.

When a page's confidence drops below 0.35, Provenant queues a background repair:

asyncio.create_task(_background_repair(uncited_pages))

75% of low-confidence queries improved after one repair cycle. Cost: ~$0.02. Touches only 0.7% of pages.

The index improves the more you use it. Without you doing anything.

Some repos benefit more than others. The pattern: small, well-documented repos see the biggest gains. Large monoliths still improve, just from a harder baseline.

Repo	Coverage@5	Improvement	Wiki pages
requests	78%
+38pp	58
pytest	72%
+32pp	186
seaborn	71%
+31pp	94
flask	69%
+29pp	74
xarray	66%	+26pp	218
sphinx	63%	+23pp	412
django	61%	+21pp	1,393
scikit-learn	57%	+17pp	1,124
matplotlib	55%	+15pp	634

requests at 78% makes sense — it's a small, well-structured library with clean module boundaries. Each file does one thing. The wiki pages are precise. The retrieval is nearly perfect.

Django at 61% is still a +21pp improvement on a 1,393-page codebase. That's not nothing.

For the ~3% of queries where even wiki vocabulary doesn't match, Provenant generates a hypothetical wiki snippet that would answer the question, then searches against that. Merged with BM25 via Reciprocal Rank Fusion.

+2.4pp Coverage@5. One extra LLM call. Not the headline — but it's there when it helps. The fact that it only fires 3% of the time is the point: the wiki handles the rest.

Speculative prefetching — I built a hook that pre-fetches wiki context whenever your agent greps a file, warming the cache. Median speedup: 1.0×. The DB reads were already fast enough. Keeping the code, not claiming a win.

Compression/pruning — removing low-attribution pages before synthesis. Firing rate on test set: 0%. The threshold was too conservative. Needs tuning before it's useful.

Self-healing at scale — the repair loop is only evaluated on Django (20 questions). I can't claim it generalises yet. It's early evidence, not a proven result.

pip install provenant

provenant init /path/to/your/repo

provenant serve --repo /path/to/your/repo

Works with Claude Code, Cursor, or anything MCP-compatible. Your agent gets provenant_ask

, provenant_search

, provenant_context

, and provenant_risk

as native tools. It stops grepping. It starts reading the wiki.

⭐ GitHub: github.com/shreyashsharma/provenant

The retrieval problem in AI coding tools is real and under-measured. BM25 on raw source code is the floor, not the ceiling.

If you try Provenant on your repo, I'm especially interested in two numbers:

Those two data points are more honest than any eval I can run on my own repos. Happy to compare notes.

Benchmarked with DeepSeek-V3.2 · nomic-embed-text-v1.5 · SWE-bench Verified (500 tasks) · 12 Python OSS repos

source & further reading

dev.to — original article What is going on? The 16.67ms Race: Mastering Real-Time 60 FPS Video Segmentation on Android WEBSITE FOR THE DEV WEEKEND CHALLENGE

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65 .

Run your AI side-project on zahid.host