# One Open Source Project a Day (No. 71): CodeGraph — Pre-Index Your Codebase for AI Agents, Save 35% Cost and 70% Tool Calls

> Source: <https://dev.to/wonderlab/one-open-source-project-a-day-no-71-codegraph-pre-index-your-codebase-for-ai-agents-save-35-50f3>
> Published: 2026-05-21 01:51:49+00:00

## Introduction

"~35% cheaper · ~70% fewer tool calls · 100% local"

This is the No.71 article in the "One Open Source Project a Day" series. Today we are exploring **CodeGraph**.

Start with a scenario: you ask Claude Code "How is AuthService being called?" Without any assistance, Claude's approach is: glob-scan directories, run multiple greps, read several files — then finally answer. The whole process might trigger 10–15 tool calls and consume hundreds of thousands of tokens.

CodeGraph's insight is to**front-load this work**: before you start, it has already parsed your codebase with tree-sitter into a semantic graph stored in a local SQLite database, then exposes 8 query tools to AI agents via MCP. When the agent needs to understand code, a single `codegraph_context`

call returns entry points, related symbols, and code snippets —**no file reading required**.

9.6k Stars, 588 Forks. Benchmarks across 7 real open-source projects: average 35% cost savings, 70% fewer tool calls, 49% speed improvement. On VS Code's large TypeScript repository, one architecture Q&A dropped from 1.4M tokens to 393k — cost from $0.64 to $0.42.

### What You Will Learn

- CodeGraph's four-stage pipeline: Extract → Store → Resolve → Auto-Sync
- The 8 MCP tools and when to use each
- A detailed breakdown of benchmark results across 7 projects: why do larger codebases benefit more?
- How 19-language support and 13-framework route recognition work
- Complete setup walkthrough from installation to Claude Code integration
-
`codegraph affected`

: using dependency tracing for smart CI test selection

### Prerequisites

- Familiarity with Claude Code, Cursor, or similar AI coding tools
- Basic understanding of MCP (Model Context Protocol)
- Node.js experience

## Project Background

### Project Introduction

CodeGraph is a**local semantic code knowledge graph** tool designed specifically to improve AI coding agent efficiency. Its core insight:

AI agents spend a massive amount of tokens and time in the "discovery phase" — scanning directories, searching for symbols, reading files — rather than on the actual reasoning and generation.

CodeGraph's solution is to**outsource the discovery phase to a pre-built index**: before you start working, the index is already ready, letting AI agents pull structured code knowledge directly instead of exploring the file system from scratch.

The technology choices are pragmatic: tree-sitter for AST parsing (mature, multi-language, high-performance), SQLite FTS5 for full-text search (zero external dependencies, fully local), and native OS file events for live sync (FSEvents/inotify/ReadDirectoryChangesW).

### Author/Team

-**Author**: Colby McHenry (GitHub: colbymchenry) -** Repository**:[colbymchenry/codegraph](https://github.com/colbymchenry/codegraph) -** Distribution**: npm package`@colbymchenry/codegraph`

### Project Stats

- ⭐ GitHub Stars:**9,600+**- 🍴 Forks:** 588**- 📦 npm package:
`@colbymchenry/codegraph`

- 🔧 Runtime: Node.js 20–24
- 💻 Platforms: Windows, macOS, Linux
- 📄 License: MIT
- 🌐 Repository:
[colbymchenry/codegraph](https://github.com/colbymchenry/codegraph)

## Main Features

### Core Utility

CodeGraph inserts a pre-built index layer between AI agents and codebases:

```
Codebase (TypeScript / Python / Go / ...)
        ↓ tree-sitter parsing
  Semantic graph (symbols + relationships + call chains)
        ↓ stored in SQLite FTS5
  Local knowledge base
        ↓ exposed via MCP
  AI coding agents (Claude Code / Cursor / Codex CLI / OpenCode)
User: "How is AuthService being called?"
→ Agent: glob("src/**/*.ts")         # Tool call 1
→ Agent: grep("AuthService")         # Tool call 2
→ Agent: read("auth.service.ts")     # Tool call 3
→ Agent: grep("import.*Auth")        # Tool call 4
→ Agent: read("user.controller.ts")  # Tool call 5
→ Agent: read("app.module.ts")       # Tool call 6
... 10–15 total tool calls, massive token consumption
```

**With CodeGraph**:

```
User: "How is AuthService being called?"
→ Agent: codegraph_callers("AuthService")   # Tool call 1
→ Returns: full caller list + call sites + code snippets
→ Agent answers directly, no file reading needed
```

### Quick Start**One-command install (recommended)**:

```
# Run the interactive installer — auto-detects installed AI agents and configures them
npx @colbymchenry/codegraph

# Initialize in your project (-i for interactive)
cd your-project
codegraph init -i
# Auto-detect all installed agents, global install
codegraph install --yes

# Target specific agents
codegraph install --target=cursor,claude --yes

# Project-local install
codegraph install --target=auto --location=local
npm install -g @colbymchenry/codegraph
```

Add to `~/.claude.json`

(or project-level `.claude.json`

):

```
{
  "mcpServers": {
    "codegraph": {
      "type": "stdio",
      "command": "codegraph",
      "args": ["serve", "--mcp"]
    }
  }
}
codegraph status          # Check index status and stats
codegraph query "UserService"  # Test symbol search
```

### The 8 MCP Tools

The complete toolset CodeGraph exposes to AI agents:

| Tool | Purpose | Typical Invocation |
|---|---|---|
`codegraph_search` |
Find symbols by name | "Find all functions called authenticate" |
`codegraph_context` |
Build code context for a task | "What code is relevant to the login flow?" |
`codegraph_callers` |
Find what calls a function | "What calls AuthService?" |
`codegraph_callees` |
Find what a function calls | "What does processPayment call internally?" |
`codegraph_impact` |
Analyze change impact radius | "What breaks if I change this function?" |
`codegraph_node` |
Get details about a specific symbol | "Show me UserController's full signature" |
`codegraph_files` |
Get indexed file structure | "What is the overall project structure?" |
`codegraph_status` |
Check index health and stats | "How many symbols are indexed? Last sync?" |**codegraph_context is the most important tool** — it doesn't just return search results; it intelligently assembles a comprehensive context package for a given task, including entry points, related symbols, and code snippets:

```
# Command-line equivalent
codegraph context "fix user login bug"
# → Automatically finds login-related functions, call chains, and relevant files
#   packaged into context Claude can consume directly
```

### Project Advantages

| Dimension | CodeGraph | Native AI Agent (no assist) | Other code indexers |
|---|---|---|---|
Tool call count |
~70% fewer | High (re-scans each task) | Partial reduction |
Token usage |
~59% fewer | High | Partial reduction |
Data privacy |
100% local | Depends on agent | Most require uploads |
Real-time sync |
Native OS file events | N/A | Usually polling or manual |
Language support |
19+ languages | Depends on agent | Usually 3–5 |
Framework route detection |
13 frameworks | None | Rare |
Installation complexity |
One npx command | N/A | Usually requires server |

## Detailed Analysis

### 1. The Four-Stage Pipeline**Stage 1: Extraction** tree-sitter parses source files into ASTs, extracting:

-**Symbols**: functions, classes, methods, interfaces, variable definitions -** Relationships**: function calls, module imports, class inheritance, interface implementations

tree-sitter's key advantage: it is a**fault-tolerant parser**— it can extract partial structure even when code has syntax errors. This is critical for indexing files that are actively being edited.**Stage 2: Storage** All data lands in a local SQLite database using the FTS5 (Full-Text Search 5) extension:

```
-- Symbols table (simplified)
CREATE VIRTUAL TABLE symbols USING fts5(
  name,          -- Symbol name
  kind,          -- function/class/method/...
  file_path,     -- Source file
  line_start,    -- Starting line
  signature,     -- Function signature
  docstring,     -- Documentation comment
  code_snippet   -- Code excerpt
);

-- Relationships table
CREATE TABLE edges (
  from_id  INTEGER,  -- Caller symbol ID
  to_id    INTEGER,  -- Callee symbol ID
  kind     TEXT,     -- calls/imports/inherits/implements
  file     TEXT,
  line     INTEGER
);
js
Source code: import { AuthService } from './auth.service'
             ...
             this.authService.login(user)
            ↓ resolution
Graph edges: UserController.login → AuthService.login (calls)
             UserController → AuthService (imports)
```**Stage 4: Auto-Sync** Uses native OS file events (not polling!) to detect changes:

- macOS:
`FSEvents`

- Linux:
`inotify`

- Windows:
`ReadDirectoryChangesW`

A**2-second debounce** prevents triggering mass rebuilds when files change rapidly — it waits for changes to settle before doing incremental updates.

### 2. Benchmark Deep Dive

Test conditions: Claude Code (headless, Opus 4.7) answering architecture questions. Each result is the median of 4 runs on the same question, across 7 real open-source repositories.

```
Project        Language       Size            Cost ↓  Token ↓  Speed ↑  Tool Calls ↓
──────────────────────────────────────────────────────────────────────────────────────
VS Code        TypeScript     ~10k files      35%     73%      41%      72%
Excalidraw     TypeScript     ~600 files      47%     73%      60%      86%
Django         Python         ~2.7k files     34%     64%      59%      81%
Tokio          Rust           ~700 files      52%     81%      63%      89%
OkHttp         Java           ~640 files      17%     41%      36%      64%
Gin            Go             ~150 files      22%     23%      34%      19%
Alamofire      Swift          ~100 files      38%     59%      51%      77%
──────────────────────────────────────────────────────────────────────────────────────
Average                                       35%     59%      49%      70%
```**Patterns worth noting**:** Tokio (Rust, 700 files) sees the biggest gains**(81% token reduction, 89% fewer tool calls): Rust's type system is complex — agents originally needed extensive file exploration to understand trait implementations and generic relationships. CodeGraph's pre-built relationships make this dramatically cheaper.**Gin (Go, 150 files) sees the smallest gains**(23% token reduction, 19% fewer tool calls): Small Go projects have simple file structures. Agents can already navigate them efficiently, so CodeGraph's marginal value is lower.**VS Code's absolute numbers are the most striking**: the same question costs $0.64 (1.4M tokens) without CodeGraph, $0.42 (393k tokens) with it. A single task saves $0.22.** Takeaway**:** The larger the codebase, the more complex the dependencies, and the richer the language's type system, the greater CodeGraph's benefit**. For developers using Claude Code heavily on large projects, the ROI is clear.

### 3. 19 Languages + 13 Framework Route Detection**Language support**(via tree-sitter grammars):

TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Vue, Liquid, Pascal/Delphi, Scala**Framework route detection** is a differentiating feature — CodeGraph doesn't just recognize symbols, it understands the mapping between URL routes and their handler functions:

```
# Django
urlpatterns = [
    path('users/<int:pk>/', UserDetailView.as_view()),
]
# → CodeGraph knows GET /users/{id}/ maps to UserDetailView

# FastAPI
@app.get("/items/{item_id}")
async def read_item(item_id: int):
    ...
# → CodeGraph knows GET /items/{id} maps to read_item()
```

The 13 supported frameworks: Django, Flask, FastAPI, Express, NestJS, Laravel, Rails, Spring, Gin/chi/gorilla/mux, Axum/actix/Rocket, ASP.NET, Vapor, React Router/SvelteKit.

This means AI agents can ask "Where is the handler for `/api/users/:id`

?" and get a precise answer, without needing to scan routing config files.

### 4. `codegraph affected`

— Smart CI Test Selection

An underappreciated feature: by tracing import dependencies, it identifies which test files are actually affected by changed source files.

```
# CI scenario: only run tests affected by this change
