cd /news/ai-agents/ai-coding-agents-search-like-it-s-20… Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-16559] src=dev.to pub= topic=ai-agents verified=true sentiment=Β· neutral

AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65 .

A developer built Provenant, a system that generates human-readable wiki pages for every file and module in a codebase, then searches those pages instead of raw source code. On SWE-bench Verified across 500 GitHub issues and 12 Python repos, Provenant achieved 63.8% Coverage@5β€”a 24 percentage point improvement over raw BM25 keyword searchβ€”while reducing tokens per query from approximately 65,000 to just 1,030. The system runs as a local MCP server and integrates with Claude, allowing developers to ask natural language questions about their code without changing their workflow.

read6 min publishedMay 28, 2026

Here's what happens every time you ask an AI coding agent a question:

This is BM25 keyword search on raw source code. It's the same algorithm that powered web search in 2009. And it's still the shape of most coding-agent retrieval systems: keyword search, grep, file search, context stuffing.

I spent the last few months building something better. Here's what I found.

When you ask "how does Flask handle URL routing?", you're writing in English. The answer lives in scaffold.py

, app.py

, and wrappers.py

β€” files full of Python syntax, decorator patterns, and Werkzeug internals.

BM25 tries to match your words against those files. It mostly fails.

The word "routing" appears 4 times in Flask's source. "URL" appears 31 times β€” mostly in docstrings and variable names scattered across 70+ files. BM25 retrieves 15 of them and hopes for the best.

The agent doesn't just have a retrieval problem. It has a vocabulary problem.

Natural language queries describe behavior. Source code implements syntax. These are different vocabularies, and no amount of BM25 tuning bridges that gap.

Generate a human-readable wiki page for every file and module, then search the wiki.

A wiki page for flask/sansio/scaffold.py

reads like this:

Scaffold is the shared base class for Flask and Blueprint.@route()

callsadd_url_rule()

, which creates a Werkzeug Rule and inserts it intourl_map

. View callables are stored inview_functions

keyed by endpoint name.

Search that for "how does Flask handle URL routing?" β€” the query and the document speak the same language. No vocabulary gap.

That's Provenant. Index once, search a wiki forever.

I ran this against SWE-bench Verified β€” 500 real GitHub issues across 12 major Python repos. The metric is Coverage@5: does the correct file appear in the top 5 retrieved results?

Method Coverage@5 Tokens/query Delta
Raw BM25 (baseline) ~40% ~65,000 β€”
Provenant (wiki + BM25)
63.8%
~1,030
+24pp
Provenant + HyDE 66.2% ~1,030 +26pp

+24 percentage points. From 40% to 63.8%. On 500 tasks. Across 12 repos.

And the token numbers aren't rounding errors:

Repo Naive tokens Provenant tokens Reduction
Flask (30 queries) 69,044 1,070 64.5Γ—
Django (20 queries) 59,634 994 60.0Γ—

Answer quality delta: βˆ’0.15 on a 5-point blind-judge scale. In this sample, that was not a meaningful drop. The model answers just as well with 1k tokens as it does with 69k β€” it just wasn't using the other 68k anyway.

Step 1: Index your repo once.

provenant init /path/to/your/repo

Provenant parses every file with tree-sitter, generates a wiki page per module via LLM, and stores everything in SQLite/FTS5 + LanceDB. 6,122 pages across 12 repos. Done in minutes.

Step 2: Start the MCP server.

provenant serve --repo /path/to/your/repo

That's it. Provenant is now a local MCP server exposing tools your agent can call natively.

Step 3: Just use Claude. No special commands.

Add it to your claude_desktop_config.json

:

{
  "mcpServers": {
    "provenant": {
      "command": "provenant",
      "args": ["serve", "--repo", "/path/to/your/repo"]
    }
  }
}

Now when you ask Claude "how does authentication work?" β€” it doesn't grep your codebase. It calls provenant_ask

, gets 3 wiki pages (~1k tokens), and answers. You never change how you work. The retrieval layer is just better.

You ask Claude a question
         ↓
Claude calls provenant_ask (MCP tool)
         ↓
Provenant: BM25 over wiki pages β†’ top-k results
         ↓
Claude synthesizes answer from ~1,030 tokens
         ↓
Attribution confidence logged β†’ weak pages auto-repaired

I asked a fresh repo β€” a Java Android music player it had never seen β€” "How does this app play music?" Here's the actual response after calling provenant_ask

:

Screenshot: Claude's unedited response after Provenant retrieved 3 wiki pages (~1k tokens). Discovery phase: ~30 seconds.

"Provenant compressed the discovery phase from ~5–10 minutes of grepping/reading to ~30 seconds. It's like having an experienced teammate say 'here's the 3 files you need and what they do' before you dive in."

β€” Claude, unprompted

That's on a Java codebase. Provenant indexes Python β€” but the wiki pages are plain English, and Claude reads English just fine.

Nobody measures when a retrieval index is wrong. BM25 returns 5 results and acts confident. The model uses 2. The other 3 were noise. The index degrades silently as your codebase changes.

I built a metric for this:

attribution confidence = pages actually cited / pages retrieved

Zero extra LLM calls. Derived from the citation structure already in the answer. It correlates with answer quality (r = 0.415 against a blind LLM judge) β€” high-confidence retrievals score 5.0/5 on average; low-confidence score 4.5.

When a page's confidence drops below 0.35, Provenant queues a background repair:

asyncio.create_task(_background_repair(uncited_pages))

75% of low-confidence queries improved after one repair cycle. Cost: ~$0.02. Touches only 0.7% of pages.

The index improves the more you use it. Without you doing anything.

Some repos benefit more than others. The pattern: small, well-documented repos see the biggest gains. Large monoliths still improve, just from a harder baseline.

Repo Coverage@5 Improvement Wiki pages
requests 78%
+38pp 58
pytest 72%
+32pp 186
seaborn 71%
+31pp 94
flask 69%
+29pp 74
xarray 66% +26pp 218
sphinx 63% +23pp 412
django 61% +21pp 1,393
scikit-learn 57% +17pp 1,124
matplotlib 55% +15pp 634

requests at 78% makes sense β€” it's a small, well-structured library with clean module boundaries. Each file does one thing. The wiki pages are precise. The retrieval is nearly perfect.

Django at 61% is still a +21pp improvement on a 1,393-page codebase. That's not nothing.

For the ~3% of queries where even wiki vocabulary doesn't match, Provenant generates a hypothetical wiki snippet that would answer the question, then searches against that. Merged with BM25 via Reciprocal Rank Fusion.

+2.4pp Coverage@5. One extra LLM call. Not the headline β€” but it's there when it helps. The fact that it only fires 3% of the time is the point: the wiki handles the rest.

Speculative prefetching β€” I built a hook that pre-fetches wiki context whenever your agent greps a file, warming the cache. Median speedup: 1.0Γ—. The DB reads were already fast enough. Keeping the code, not claiming a win.

Compression/pruning β€” removing low-attribution pages before synthesis. Firing rate on test set: 0%. The threshold was too conservative. Needs tuning before it's useful.

Self-healing at scale β€” the repair loop is only evaluated on Django (20 questions). I can't claim it generalises yet. It's early evidence, not a proven result.

pip install provenant

provenant init /path/to/your/repo

provenant serve --repo /path/to/your/repo

Works with Claude Code, Cursor, or anything MCP-compatible. Your agent gets provenant_ask

, provenant_search

, provenant_context

, and provenant_risk

as native tools. It stops grepping. It starts reading the wiki.

⭐ GitHub: github.com/shreyashsharma/provenant

The retrieval problem in AI coding tools is real and under-measured. BM25 on raw source code is the floor, not the ceiling.

If you try Provenant on your repo, I'm especially interested in two numbers:

Those two data points are more honest than any eval I can run on my own repos. Happy to compare notes.

Benchmarked with DeepSeek-V3.2 Β· nomic-embed-text-v1.5 Β· SWE-bench Verified (500 tasks) Β· 12 Python OSS repos

── more in #ai-agents 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/ai-coding-agents-sea…] indexed:0 read:6min 2026-05-28 Β· β€”