{"slug": "one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-35", "title": "One Open Source Project a Day (No. 71): CodeGraph — Pre-Index Your Codebase for AI Agents, Save 35% Cost and 70% Tool Calls", "summary": "CodeGraph is an open-source tool that pre-indexes codebases into a local semantic graph using tree-sitter and SQLite, allowing AI coding agents to access structured code knowledge with a single tool call instead of performing multiple file scans and searches. Benchmarks across seven real projects show it reduces tool calls by 70% and costs by 35%, with one architecture query on VS Code's TypeScript repository dropping from 1.4 million tokens to 393,000 tokens. The tool exposes eight query tools via the Model Context Protocol (MCP) and supports live file synchronization through native OS events.", "body_md": "## Introduction\n\n\"~35% cheaper · ~70% fewer tool calls · 100% local\"\n\nThis is the No.71 article in the \"One Open Source Project a Day\" series. Today we are exploring **CodeGraph**.\n\nStart with a scenario: you ask Claude Code \"How is AuthService being called?\" Without any assistance, Claude's approach is: glob-scan directories, run multiple greps, read several files — then finally answer. The whole process might trigger 10–15 tool calls and consume hundreds of thousands of tokens.\n\nCodeGraph's insight is to**front-load this work**: before you start, it has already parsed your codebase with tree-sitter into a semantic graph stored in a local SQLite database, then exposes 8 query tools to AI agents via MCP. When the agent needs to understand code, a single `codegraph_context`\n\ncall returns entry points, related symbols, and code snippets —**no file reading required**.\n\n9.6k Stars, 588 Forks. Benchmarks across 7 real open-source projects: average 35% cost savings, 70% fewer tool calls, 49% speed improvement. On VS Code's large TypeScript repository, one architecture Q&A dropped from 1.4M tokens to 393k — cost from $0.64 to $0.42.\n\n### What You Will Learn\n\n- CodeGraph's four-stage pipeline: Extract → Store → Resolve → Auto-Sync\n- The 8 MCP tools and when to use each\n- A detailed breakdown of benchmark results across 7 projects: why do larger codebases benefit more?\n- How 19-language support and 13-framework route recognition work\n- Complete setup walkthrough from installation to Claude Code integration\n-\n`codegraph affected`\n\n: using dependency tracing for smart CI test selection\n\n### Prerequisites\n\n- Familiarity with Claude Code, Cursor, or similar AI coding tools\n- Basic understanding of MCP (Model Context Protocol)\n- Node.js experience\n\n## Project Background\n\n### Project Introduction\n\nCodeGraph is a**local semantic code knowledge graph**tool designed specifically to improve AI coding agent efficiency. Its core insight:\n\nAI agents spend a massive amount of tokens and time in the \"discovery phase\" — scanning directories, searching for symbols, reading files — rather than on the actual reasoning and generation.\n\nCodeGraph's solution is to**outsource the discovery phase to a pre-built index**: before you start working, the index is already ready, letting AI agents pull structured code knowledge directly instead of exploring the file system from scratch.\n\nThe technology choices are pragmatic: tree-sitter for AST parsing (mature, multi-language, high-performance), SQLite FTS5 for full-text search (zero external dependencies, fully local), and native OS file events for live sync (FSEvents/inotify/ReadDirectoryChangesW).\n\n### Author/Team\n\n-**Author**: Colby McHenry (GitHub: colbymchenry) -**Repository**:[colbymchenry/codegraph](https://github.com/colbymchenry/codegraph) -**Distribution**: npm package`@colbymchenry/codegraph`\n\n### Project Stats\n\n- ⭐ GitHub Stars:**9,600+**- 🍴 Forks:**588**- 📦 npm package:\n`@colbymchenry/codegraph`\n\n- 🔧 Runtime: Node.js 20–24\n- 💻 Platforms: Windows, macOS, Linux\n- 📄 License: MIT\n- 🌐 Repository:\n[colbymchenry/codegraph](https://github.com/colbymchenry/codegraph)\n\n## Main Features\n\n### Core Utility\n\nCodeGraph inserts a pre-built index layer between AI agents and codebases:\n\n```\nCodebase (TypeScript / Python / Go / ...)\n        ↓ tree-sitter parsing\n  Semantic graph (symbols + relationships + call chains)\n        ↓ stored in SQLite FTS5\n  Local knowledge base\n        ↓ exposed via MCP\n  AI coding agents (Claude Code / Cursor / Codex CLI / OpenCode)\nUser: \"How is AuthService being called?\"\n→ Agent: glob(\"src/**/*.ts\")         # Tool call 1\n→ Agent: grep(\"AuthService\")         # Tool call 2\n→ Agent: read(\"auth.service.ts\")     # Tool call 3\n→ Agent: grep(\"import.*Auth\")        # Tool call 4\n→ Agent: read(\"user.controller.ts\")  # Tool call 5\n→ Agent: read(\"app.module.ts\")       # Tool call 6\n... 10–15 total tool calls, massive token consumption\n```\n\n**With CodeGraph**:\n\n```\nUser: \"How is AuthService being called?\"\n→ Agent: codegraph_callers(\"AuthService\")   # Tool call 1\n→ Returns: full caller list + call sites + code snippets\n→ Agent answers directly, no file reading needed\n```\n\n### Quick Start**One-command install (recommended)**:\n\n```\n# Run the interactive installer — auto-detects installed AI agents and configures them\nnpx @colbymchenry/codegraph\n\n# Initialize in your project (-i for interactive)\ncd your-project\ncodegraph init -i\n# Auto-detect all installed agents, global install\ncodegraph install --yes\n\n# Target specific agents\ncodegraph install --target=cursor,claude --yes\n\n# Project-local install\ncodegraph install --target=auto --location=local\nnpm install -g @colbymchenry/codegraph\n```\n\nAdd to `~/.claude.json`\n\n(or project-level `.claude.json`\n\n):\n\n```\n{\n  \"mcpServers\": {\n    \"codegraph\": {\n      \"type\": \"stdio\",\n      \"command\": \"codegraph\",\n      \"args\": [\"serve\", \"--mcp\"]\n    }\n  }\n}\ncodegraph status          # Check index status and stats\ncodegraph query \"UserService\"  # Test symbol search\n```\n\n### The 8 MCP Tools\n\nThe complete toolset CodeGraph exposes to AI agents:\n\n| Tool | Purpose | Typical Invocation |\n|---|---|---|\n`codegraph_search` |\nFind symbols by name | \"Find all functions called authenticate\" |\n`codegraph_context` |\nBuild code context for a task | \"What code is relevant to the login flow?\" |\n`codegraph_callers` |\nFind what calls a function | \"What calls AuthService?\" |\n`codegraph_callees` |\nFind what a function calls | \"What does processPayment call internally?\" |\n`codegraph_impact` |\nAnalyze change impact radius | \"What breaks if I change this function?\" |\n`codegraph_node` |\nGet details about a specific symbol | \"Show me UserController's full signature\" |\n`codegraph_files` |\nGet indexed file structure | \"What is the overall project structure?\" |\n`codegraph_status` |\nCheck index health and stats | \"How many symbols are indexed? Last sync?\" |**codegraph_context is the most important tool** — it doesn't just return search results; it intelligently assembles a comprehensive context package for a given task, including entry points, related symbols, and code snippets:\n\n```\n# Command-line equivalent\ncodegraph context \"fix user login bug\"\n# → Automatically finds login-related functions, call chains, and relevant files\n#   packaged into context Claude can consume directly\n```\n\n### Project Advantages\n\n| Dimension | CodeGraph | Native AI Agent (no assist) | Other code indexers |\n|---|---|---|---|\nTool call count |\n~70% fewer | High (re-scans each task) | Partial reduction |\nToken usage |\n~59% fewer | High | Partial reduction |\nData privacy |\n100% local | Depends on agent | Most require uploads |\nReal-time sync |\nNative OS file events | N/A | Usually polling or manual |\nLanguage support |\n19+ languages | Depends on agent | Usually 3–5 |\nFramework route detection |\n13 frameworks | None | Rare |\nInstallation complexity |\nOne npx command | N/A | Usually requires server |\n\n## Detailed Analysis\n\n### 1. The Four-Stage Pipeline**Stage 1: Extraction**tree-sitter parses source files into ASTs, extracting:\n\n-**Symbols**: functions, classes, methods, interfaces, variable definitions -**Relationships**: function calls, module imports, class inheritance, interface implementations\n\ntree-sitter's key advantage: it is a**fault-tolerant parser**— it can extract partial structure even when code has syntax errors. This is critical for indexing files that are actively being edited.**Stage 2: Storage**All data lands in a local SQLite database using the FTS5 (Full-Text Search 5) extension:\n\n```\n-- Symbols table (simplified)\nCREATE VIRTUAL TABLE symbols USING fts5(\n  name,          -- Symbol name\n  kind,          -- function/class/method/...\n  file_path,     -- Source file\n  line_start,    -- Starting line\n  signature,     -- Function signature\n  docstring,     -- Documentation comment\n  code_snippet   -- Code excerpt\n);\n\n-- Relationships table\nCREATE TABLE edges (\n  from_id  INTEGER,  -- Caller symbol ID\n  to_id    INTEGER,  -- Callee symbol ID\n  kind     TEXT,     -- calls/imports/inherits/implements\n  file     TEXT,\n  line     INTEGER\n);\njs\nSource code: import { AuthService } from './auth.service'\n             ...\n             this.authService.login(user)\n            ↓ resolution\nGraph edges: UserController.login → AuthService.login (calls)\n             UserController → AuthService (imports)\n```**Stage 4: Auto-Sync**Uses native OS file events (not polling!) to detect changes:\n\n- macOS:\n`FSEvents`\n\n- Linux:\n`inotify`\n\n- Windows:\n`ReadDirectoryChangesW`\n\nA**2-second debounce**prevents triggering mass rebuilds when files change rapidly — it waits for changes to settle before doing incremental updates.\n\n### 2. Benchmark Deep Dive\n\nTest conditions: Claude Code (headless, Opus 4.7) answering architecture questions. Each result is the median of 4 runs on the same question, across 7 real open-source repositories.\n\n```\nProject        Language       Size            Cost ↓  Token ↓  Speed ↑  Tool Calls ↓\n──────────────────────────────────────────────────────────────────────────────────────\nVS Code        TypeScript     ~10k files      35%     73%      41%      72%\nExcalidraw     TypeScript     ~600 files      47%     73%      60%      86%\nDjango         Python         ~2.7k files     34%     64%      59%      81%\nTokio          Rust           ~700 files      52%     81%      63%      89%\nOkHttp         Java           ~640 files      17%     41%      36%      64%\nGin            Go             ~150 files      22%     23%      34%      19%\nAlamofire      Swift          ~100 files      38%     59%      51%      77%\n──────────────────────────────────────────────────────────────────────────────────────\nAverage                                       35%     59%      49%      70%\n```**Patterns worth noting**:**Tokio (Rust, 700 files) sees the biggest gains**(81% token reduction, 89% fewer tool calls): Rust's type system is complex — agents originally needed extensive file exploration to understand trait implementations and generic relationships. CodeGraph's pre-built relationships make this dramatically cheaper.**Gin (Go, 150 files) sees the smallest gains**(23% token reduction, 19% fewer tool calls): Small Go projects have simple file structures. Agents can already navigate them efficiently, so CodeGraph's marginal value is lower.**VS Code's absolute numbers are the most striking**: the same question costs $0.64 (1.4M tokens) without CodeGraph, $0.42 (393k tokens) with it. A single task saves $0.22.**Takeaway**:**The larger the codebase, the more complex the dependencies, and the richer the language's type system, the greater CodeGraph's benefit**. For developers using Claude Code heavily on large projects, the ROI is clear.\n\n### 3. 19 Languages + 13 Framework Route Detection**Language support**(via tree-sitter grammars):\n\nTypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Vue, Liquid, Pascal/Delphi, Scala**Framework route detection** is a differentiating feature — CodeGraph doesn't just recognize symbols, it understands the mapping between URL routes and their handler functions:\n\n```\n# Django\nurlpatterns = [\n    path('users/<int:pk>/', UserDetailView.as_view()),\n]\n# → CodeGraph knows GET /users/{id}/ maps to UserDetailView\n\n# FastAPI\n@app.get(\"/items/{item_id}\")\nasync def read_item(item_id: int):\n    ...\n# → CodeGraph knows GET /items/{id} maps to read_item()\n```\n\nThe 13 supported frameworks: Django, Flask, FastAPI, Express, NestJS, Laravel, Rails, Spring, Gin/chi/gorilla/mux, Axum/actix/Rocket, ASP.NET, Vapor, React Router/SvelteKit.\n\nThis means AI agents can ask \"Where is the handler for `/api/users/:id`\n\n?\" and get a precise answer, without needing to scan routing config files.\n\n### 4. `codegraph affected`\n\n— Smart CI Test Selection\n\nAn underappreciated feature: by tracing import dependencies, it identifies which test files are actually affected by changed source files.\n\n```\n# CI scenario: only run tests affected by this change", "url": "https://wpnews.pro/news/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-35", "canonical_source": "https://dev.to/wonderlab/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-agents-save-35-50f3", "published_at": "2026-05-21 01:51:49+00:00", "updated_at": "2026-05-21 02:03:22.315973+00:00", "lang": "en", "topics": ["open-source", "developer-tools", "artificial-intelligence", "large-language-models"], "entities": ["CodeGraph", "Claude Code", "MCP", "VS Code", "AuthService", "tree-sitter", "SQLite"], "alternates": {"html": "https://wpnews.pro/news/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-35", "markdown": "https://wpnews.pro/news/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-35.md", "text": "https://wpnews.pro/news/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-35.txt", "jsonld": "https://wpnews.pro/news/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-35.jsonld"}}