{"slug": "ai-coding-agents-search-like-it-s-2009-provenant-cuts-tokens-by-65", "title": "AI Coding Agents Search Like It's 2009. Provenant Cuts Tokens by 65 .", "summary": "A developer built Provenant, a system that generates human-readable wiki pages for every file and module in a codebase, then searches those pages instead of raw source code. On SWE-bench Verified across 500 GitHub issues and 12 Python repos, Provenant achieved 63.8% Coverage@5—a 24 percentage point improvement over raw BM25 keyword search—while reducing tokens per query from approximately 65,000 to just 1,030. The system runs as a local MCP server and integrates with Claude, allowing developers to ask natural language questions about their code without changing their workflow.", "body_md": "Here's what happens every time you ask an AI coding agent a question:\n\nThis is BM25 keyword search on raw source code. It's the same algorithm that powered web search in 2009. And it's still the shape of most coding-agent retrieval systems: keyword search, grep, file search, context stuffing.\n\nI spent the last few months building something better. Here's what I found.\n\nWhen you ask *\"how does Flask handle URL routing?\"*, you're writing in English. The answer lives in `scaffold.py`\n\n, `app.py`\n\n, and `wrappers.py`\n\n— files full of Python syntax, decorator patterns, and Werkzeug internals.\n\nBM25 tries to match your words against those files. It mostly fails.\n\nThe word \"routing\" appears 4 times in Flask's source. \"URL\" appears 31 times — mostly in docstrings and variable names scattered across 70+ files. BM25 retrieves 15 of them and hopes for the best.\n\nThe agent doesn't just have a retrieval problem. It has a vocabulary problem.\n\nNatural language queries describe *behavior*. Source code implements *syntax*. These are different vocabularies, and no amount of BM25 tuning bridges that gap.\n\n**Generate a human-readable wiki page for every file and module, then search the wiki.**\n\nA wiki page for `flask/sansio/scaffold.py`\n\nreads like this:\n\nScaffold is the shared base class for Flask and Blueprint.`@route()`\n\ncalls`add_url_rule()`\n\n, which creates a Werkzeug Rule and inserts it into`url_map`\n\n. View callables are stored in`view_functions`\n\nkeyed by endpoint name.\n\nSearch that for *\"how does Flask handle URL routing?\"* — the query and the document speak the same language. No vocabulary gap.\n\nThat's **Provenant**. Index once, search a wiki forever.\n\nI ran this against **SWE-bench Verified** — 500 real GitHub issues across 12 major Python repos. The metric is **Coverage@5**: does the correct file appear in the top 5 retrieved results?\n\n| Method | Coverage@5 | Tokens/query | Delta |\n|---|---|---|---|\n| Raw BM25 (baseline) | ~40% | ~65,000 | — |\nProvenant (wiki + BM25) |\n63.8% |\n~1,030 |\n+24pp |\n| Provenant + HyDE | 66.2% | ~1,030 | +26pp |\n\n**+24 percentage points.** From 40% to 63.8%. On 500 tasks. Across 12 repos.\n\nAnd the token numbers aren't rounding errors:\n\n| Repo | Naive tokens | Provenant tokens | Reduction |\n|---|---|---|---|\n| Flask (30 queries) | 69,044 | 1,070 | 64.5× |\n| Django (20 queries) | 59,634 | 994 | 60.0× |\n\nAnswer quality delta: **−0.15 on a 5-point blind-judge scale.** In this sample, that was not a meaningful drop. The model answers just as well with 1k tokens as it does with 69k — it just wasn't using the other 68k anyway.\n\n**Step 1: Index your repo once.**\n\n```\nprovenant init /path/to/your/repo\n```\n\nProvenant parses every file with tree-sitter, generates a wiki page per module via LLM, and stores everything in SQLite/FTS5 + LanceDB. 6,122 pages across 12 repos. Done in minutes.\n\n**Step 2: Start the MCP server.**\n\n```\nprovenant serve --repo /path/to/your/repo\n```\n\nThat's it. Provenant is now a local MCP server exposing tools your agent can call natively.\n\n**Step 3: Just use Claude. No special commands.**\n\nAdd it to your `claude_desktop_config.json`\n\n:\n\n```\n{\n  \"mcpServers\": {\n    \"provenant\": {\n      \"command\": \"provenant\",\n      \"args\": [\"serve\", \"--repo\", \"/path/to/your/repo\"]\n    }\n  }\n}\n```\n\nNow when you ask Claude *\"how does authentication work?\"* — it doesn't grep your codebase. It calls `provenant_ask`\n\n, gets 3 wiki pages (~1k tokens), and answers. You never change how you work. The retrieval layer is just better.\n\n```\nYou ask Claude a question\n         ↓\nClaude calls provenant_ask (MCP tool)\n         ↓\nProvenant: BM25 over wiki pages → top-k results\n         ↓\nClaude synthesizes answer from ~1,030 tokens\n         ↓\nAttribution confidence logged → weak pages auto-repaired\n```\n\nI asked a fresh repo — a Java Android music player it had never seen — *\"How does this app play music?\"* Here's the actual response after calling `provenant_ask`\n\n:\n\n*Screenshot: Claude's unedited response after Provenant retrieved 3 wiki pages (~1k tokens). Discovery phase: ~30 seconds.*\n\n\"Provenant compressed the discovery phase from ~5–10 minutes of grepping/reading to ~30 seconds. It's like having an experienced teammate say 'here's the 3 files you need and what they do' before you dive in.\"\n\n— Claude, unprompted\n\nThat's on a Java codebase. Provenant indexes Python — but the wiki pages are plain English, and Claude reads English just fine.\n\nNobody measures when a retrieval index is wrong. BM25 returns 5 results and acts confident. The model uses 2. The other 3 were noise. The index degrades silently as your codebase changes.\n\nI built a metric for this:\n\n```\nattribution confidence = pages actually cited / pages retrieved\n```\n\nZero extra LLM calls. Derived from the citation structure already in the answer. It correlates with answer quality (r = 0.415 against a blind LLM judge) — high-confidence retrievals score 5.0/5 on average; low-confidence score 4.5.\n\nWhen a page's confidence drops below 0.35, Provenant queues a background repair:\n\n```\n# Fires silently after low-confidence answers\nasyncio.create_task(_background_repair(uncited_pages))\n```\n\n**75% of low-confidence queries improved after one repair cycle.** Cost: ~$0.02. Touches only 0.7% of pages.\n\n**The index improves the more you use it.** Without you doing anything.\n\nSome repos benefit more than others. The pattern: **small, well-documented repos see the biggest gains.** Large monoliths still improve, just from a harder baseline.\n\n| Repo | Coverage@5 | Improvement | Wiki pages |\n|---|---|---|---|\n| requests | 78% |\n+38pp | 58 |\n| pytest | 72% |\n+32pp | 186 |\n| seaborn | 71% |\n+31pp | 94 |\n| flask | 69% |\n+29pp | 74 |\n| xarray | 66% | +26pp | 218 |\n| sphinx | 63% | +23pp | 412 |\n| django | 61% | +21pp | 1,393 |\n| scikit-learn | 57% | +17pp | 1,124 |\n| matplotlib | 55% | +15pp | 634 |\n\nrequests at 78% makes sense — it's a small, well-structured library with clean module boundaries. Each file does one thing. The wiki pages are precise. The retrieval is nearly perfect.\n\nDjango at 61% is still a +21pp improvement on a 1,393-page codebase. That's not nothing.\n\nFor the ~3% of queries where even wiki vocabulary doesn't match, Provenant generates a hypothetical wiki snippet that *would* answer the question, then searches against that. Merged with BM25 via Reciprocal Rank Fusion.\n\n+2.4pp [Coverage@5](mailto:Coverage@5). One extra LLM call. Not the headline — but it's there when it helps. The fact that it only fires 3% of the time is the point: the wiki handles the rest.\n\n**Speculative prefetching** — I built a hook that pre-fetches wiki context whenever your agent greps a file, warming the cache. Median speedup: 1.0×. The DB reads were already fast enough. Keeping the code, not claiming a win.\n\n**Compression/pruning** — removing low-attribution pages before synthesis. Firing rate on test set: 0%. The threshold was too conservative. Needs tuning before it's useful.\n\n**Self-healing at scale** — the repair loop is only evaluated on Django (20 questions). I can't claim it generalises yet. It's early evidence, not a proven result.\n\n```\npip install provenant\n\n# Index\nprovenant init /path/to/your/repo\n\n# Serve (MCP)\nprovenant serve --repo /path/to/your/repo\n```\n\nWorks with Claude Code, Cursor, or anything MCP-compatible. Your agent gets `provenant_ask`\n\n, `provenant_search`\n\n, `provenant_context`\n\n, and `provenant_risk`\n\nas native tools. It stops grepping. It starts reading the wiki.\n\n⭐ **GitHub: github.com/shreyashsharma/provenant**\n\nThe retrieval problem in AI coding tools is real and under-measured. BM25 on raw source code is the floor, not the ceiling.\n\nIf you try Provenant on your repo, I'm especially interested in two numbers:\n\nThose two data points are more honest than any eval I can run on my own repos. Happy to compare notes.\n\n*Benchmarked with DeepSeek-V3.2 · nomic-embed-text-v1.5 · SWE-bench Verified (500 tasks) · 12 Python OSS repos*", "url": "https://wpnews.pro/news/ai-coding-agents-search-like-it-s-2009-provenant-cuts-tokens-by-65", "canonical_source": "https://dev.to/corpsekiller/ai-coding-agents-search-like-its-2009-provenant-cuts-tokens-by-65x-3jg9", "published_at": "2026-05-28 14:40:02+00:00", "updated_at": "2026-05-28 14:53:59.949362+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-research", "natural-language-processing", "large-language-models"], "entities": ["BM25", "Flask", "Werkzeug", "Provenant"], "alternates": {"html": "https://wpnews.pro/news/ai-coding-agents-search-like-it-s-2009-provenant-cuts-tokens-by-65", "markdown": "https://wpnews.pro/news/ai-coding-agents-search-like-it-s-2009-provenant-cuts-tokens-by-65.md", "text": "https://wpnews.pro/news/ai-coding-agents-search-like-it-s-2009-provenant-cuts-tokens-by-65.txt", "jsonld": "https://wpnews.pro/news/ai-coding-agents-search-like-it-s-2009-provenant-cuts-tokens-by-65.jsonld"}}