TokenZip v2 — PRD, HLD, LLD

TokenZip v2 is a token compression engine that reduces LLM input token costs by up to 95% for coding copilots like Claude Code and Codex by transforming an entire codebase into a multi-level, queryable knowledge graph stored locally in `.tokenzip/db`. It auto-detects module boundaries, supports nested monorepo structures, and stores symbols with relationships (CALLS, IMPLEMENTS, INHERITS, etc.) using SurrealDB with RocksDB storage, enabling incremental parsing and fast queries under 100ms for repos up to 100K files. The system is exposed as an MCP server for AI copilots, kept fresh via git hooks, and includes structured markdown parsing with Mermaid block conversion and cross-reference resolution.

TokenZip — PRD, HLD, LLD --- 📋 PRD — Product Requirements Document 1. Executive Summary TokenZip v2 transforms Karpathy's llm wiki concept into a gzip like token compression engine on top of entire codebase, which can reduce the LLM input token cost upto by 95% when using with Coding Copilots like Claude Code, Codex etc. Instead of generating a flat text summary, it builds a multi-level, queryable, chainable knowledge graph — from repo → modules → files → symbols — stored locally in .tokenzip/db , exposed as an MCP server for any AI copilot, and kept fresh via git hooks 2. Problem Statement | Problem | Impact | |---|---| | AI copilots lack structural awareness of large codebases | They hallucinate imports, miss dependencies, suggest changes in wrong modules | | Text-based token references are flat and non-queryable | Cannot ask "which functions depend on this interface?" or "what modules does this feature span?" | | No persistent code intelligence layer | Every session re-parses from scratch, wasting tokens and time | | Documentation PRD/HLD/LLD/README is unstructured | AI can't extract workflows, sequence diagrams, or release plans from markdown | | Cross-language dependency tracking is manual | A SQL schema change affecting 3 TS files is invisible until runtime | | Cross-repository dependency tracking is manual | The current repository has no awareness of dependent or upstream repositories, including shared interfaces, API contracts, endpoint usage, schema dependencies, or cross-repo integrations — making impact analysis and coordinated changes error-prone | | Version-aware dependency conflicts are difficult to detect | AI copilots and developers lack visibility into incompatible interface versions, breaking API/schema changes, SDK mismatches, or transitive dependency drift across repositories — causing silent integration failures and upgrade risks POC Results Under 30 seconds indexing time for a codebase with ~1950 files <img width="1639" height="855" alt="image" src="https://gist.github.com/user-attachments/assets/f19d00a0-19c2-490f-86a6-f67452b6452f" / Under 1 seconds lookup. <img width="909" height="637" alt="image" src="https://gist.github.com/user-attachments/assets/5d25d6b3-c34a-46f1-9cde-857c8e6a69ee" / 3. Target Users Primary - AI Copilot Users Claude Code, Codex, OpenCode, Kilo Code — need structured context without token waste - Full-stack Developers working in monorepos with 50+ modules Secondary - Tech Leads auditing codebase structure and dependency health - Onboarding Engineers needing rapid codebase mental model 4. Product Vision "Your codebase as a queryable graph — not a text dump. Ask structural questions, get precise answers, zero hallucination." 5. Feature Specification 5.1 Multi-Level Code Graph Repository └── Module auto-detected: package.json, pyproject.toml, go.mod, Cargo.toml, etc. └── File └── Symbol function, class, interface, variable, table, column, etc. Acceptance Criteria: - Auto-detect module boundaries by presence of manifest files - Support nested modules monorepo: repo → apps/web → src/components - Each node has a stable UUID that survives renames content-hash + path-hash hybrid 5.2 Tree-Sitter Metadata Extraction | Language | Extracted Artifacts | |---|---| | .js , .mjs | Functions, classes, exports, imports, global vars, JSDoc | | .ts , .tsx | Above + interfaces, type aliases, generics, enums, decorators, namespace exports | | .py | Functions, classes, decorators, type hints, imports, async defs | | .sql | Tables, views, columns, constraints, indexes, foreign keys, stored procedures | | .go | Functions, structs, interfaces, methods, packages, imports | | .rs | Functions, structs, traits, impls, enums, mods, use statements | | .java , .kt | Classes, interfaces, methods, annotations, packages | | .md special | Headings, lists, code blocks, mermaid diagrams, tables, frontmatter | Acceptance Criteria: - Each symbol stored as a node with: name, kind, signature, line range, hash, docstring - Relationships: CALLS , IMPLEMENTS , INHERITS , IMPORTS , EXPORTS , MODIFIES , READS - Incremental parse: only re-parse files whose content hash changed - Parse errors stored as node metadata not silently dropped 5.3 Documentation Intelligence For structured markdown files .prd.md , .hld.md , .lld.md , README.md , CHANGELOG.md , ADR/ .md : | Section Type | Extracted Structure | |---|---| | Workflow / Flow | Ordered step graph with actors and actions | | Sequence Diagram | Parsed mermaid sequenceDiagram into actor→message→actor edges | | Flowchart | Parsed mermaid flowchart into decision/action node graph | | Release Plan | Timeline with milestones, versions, dates | | API | Endpoint → method → params → response schema | | Architecture / Components | Component hierarchy with responsibility and tech stack | | Decision ADR | Context → Decision → Consequences as structured tuple | | Standard lists | Typed list items checkbox, numbered, bullet with nesting | | Tables | Columnar data as records | Acceptance Criteria: - Mermaid blocks parsed into graph nodes, not stored as raw text - Section-level linking: a workflow step can reference a function symbol node - Cross-reference resolution: see ModuleX in PRD links to Module node in graph 5.4 Chainable Query API typescript // Level 1: Repository const repo = tz.repo '.' ; // Level 2: Modules filterable, chainable const feModules = repo.modules .filter m = m.language === 'typescript' ; // Level 3: Files within modules const tsFiles = feModules.files .filter f = f.ext === '.tsx' ; // Level 4: Symbols within files const exportedComponents = tsFiles.symbols .filter s = s.kind === 'class' && s.isExported && s.extends 'React.Component' ; // Cross-cutting queries const dependants = tz.repo '.' .symbol 'UserService.authenticate' .dependants // who calls this? .withinModule 'api-gateway' // scope it .withKind 'function' ; // filter const impact = tz.repo '.' .table 'users' .columns // what columns .referencedBy // where are they referenced .files ; // which files const workflow = tz.repo '.' .doc 'prd.md' .section 'Workflow: User Onboarding' .steps // ordered steps .linkedSymbols ; // what code implements each step Acceptance Criteria: - Every level returns a query builder, not raw data lazy evaluation - .toArray , .toGraph , .toMarkdown , .toJSON terminal methods - Queries translate to SurrealDB graph traversal queries - Response < 100ms for repos up to 100K files 5.5 Graph Database Storage - Engine: SurrealDB embedded via RocksDB storage - Location: <project root /.tokenzip/db/ - Schema: Schemaful strict types per node kind - Persistence: WAL-enabled, crash-safe Acceptance Criteria: - .tokenzip/ added to .gitignore automatically - DB size < 10% of source code size for typical repos - Cold start first full parse completes at 500 files/second - Hot start incremental completes at 2000 files/second 5.6 Git Hook Integration bash Installed via: tokenzip init Creates .git/hooks/pre-commit and .git/hooks/post-commit pre-commit: 1. Detect staged files git diff --cached --name-only 2. Parse changed files with tree-sitter 3. Diff new AST against stored graph 4. Validate: no broken exports, no orphan imports 5. Update graph with new symbol nodes/edges 6. If validation fails: warn configurable: warn/block post-commit: 1. Store commit metadata hash, message, author, timestamp 2. Create COMMIT → MODIFIED → FILE edges 3. Update file-level git history nodes Acceptance Criteria: - Hook installation is non-destructive appends to existing hooks - Hook execution adds < 500ms to commit time for typical changes < 10 files - tokenzip init --no-hooks flag for CI environments - tokenzip status shows graph health stale files, broken references 5.7 MCP Server jsonc // Exposed to any MCP-compatible client { "tools": "query repo structure", "query module", "query file", "query symbol", "get dependencies", "get dependants", "search symbols", "get git history", "get workflow", "get impact analysis", "execute workflow template" , "resources": "tokenzip://repo/structure", "tokenzip://module/{name}/overview", "tokenzip://file/{path}/symbols", "tokenzip://symbol/{id}/detail" } Acceptance Criteria: - MCP server starts in < 200ms - All tools return structured JSON never raw text dumps - Token budget aware: responses include token count metadata - Works with Claude Code, Codex, OpenCode, Kilo Code without config changes - Concurrent tool calls supported SurrealDB connection pooling 5.8 Workflow Templates | Workflow | Input | Output | Graph Operations | |---|---|---|---| | Create Module | module name, type, dependencies | Scaffolded structure + graph nodes | CREATE module, CREATE files, CREATE IMPORTS edges | | Update Module | module name, change description | Affected files + symbols list | READ dependants, READ dependents, DIFF graph | | Implement Feature | feature description, target module | Files to create/modify, symbol gaps | SEARCH related symbols, PATH analysis, IMPACT query | | Upgrade Feature | feature name, upgrade description | Migration plan + affected modules | SUBGRAPH extraction, DEPENDENCY chain analysis | | Bug Fix | error message / stack trace | Root cause candidates + impact radius | TRACE call chain, FIND modified symbols in git blame range | Acceptance Criteria: - Each workflow is a deterministic graph query sequence, not LLM-generated - Workflows return structured data that an LLM can act on not final answers - Workflow results are cached and timestamped in the graph 6. Non-Functional Requirements | Category | Requirement | |---|---| | Performance | Full index of 100K file repo < 3 minutes; incremental update < 2 seconds | | Memory | MCP server idle < 50MB; parsing peak < 500MB | | Reliability | Never corrupt the graph on crash; WAL recovery on restart | | Compatibility | Node.js 20+, macOS 12+, Ubuntu 22.04+, Windows WSL2 | | Security | No network calls; all data local; no code execution from graph | | Extensibility | New language support via plugin tree-sitter grammar + extractor config | 7. Success Metrics | Metric | Target | |---|---| | Copilot context accuracy relevant vs irrelevant tokens | 85% vs ~40% with text dump | | Time to first useful query after tokenzip init | < 5 minutes for 50K file repo | | Hook overhead per commit | < 500ms | | MCP tool call latency p95 | < 200ms | | Graph size efficiency | < 10% of source size | 8. Out of Scope v2 - Remote graph synchronization multi-developer shared graph - LLM-powered code generation this is a context layer, not a code writer - Runtime analysis only static analysis via tree-sitter - Binary file parsing images, compiled artifacts - IDE plugin VS Code extension is v3 9. Release Phases | Phase | Scope | Timeline | |---|---|---| | Alpha | Core graph + JS/TS parsing + MCP server + basic queries | Week 1-3 | | Beta | All languages + git hooks + documentation intelligence | Week 4-6 | | RC | Workflow templates + chainable API polish + perf tuning | Week 7-8 | | GA | Stability hardening + plugin system + docs | Week 9-10 | --- 🏗️ HLD — High-Level Design 1. Architecture Overview TokenZip v2 is a local-first, static-analysis graph engine with four layers: ┌─────────────────────────────────────────────────────────────────┐ │ LAYER 4: INTEGRATION │ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │ │ │ Claude │ │ Codex │ │ OpenCode │ │ Kilo Code │ │ │ │ Code │ │ │ │ │ │ │ │ │ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬────────┘ │ │ │ │ │ │ │ │ └──────────────┴──────┬───────┴────────────────┘ │ │ │ MCP Protocol stdio/SSE │ ├─────────────────────────────┼───────────────────────────────────┤ │ LAYER 3: API & QUERY │ │ ┌──────────────────────────┴──────────────────────────────┐ │ │ │ MCP Server │ │ │ │ ┌─────────────────┐ ┌──────────────────────────────┐ │ │ │ │ │ Tool Registry │ │ Resource Registry │ │ │ │ │ └────────┬────────┘ └──────────────┬───────────────┘ │ │ │ │ └──────────┬───────────────┘ │ │ │ │ ┌───────┴────────┐ │ │ │ │ │ Chainable Query│ │ │ │ │ │ Builder CQB │ │ │ │ │ └───────┬────────┘ │ │ │ └──────────────────────┼──────────────────────────────────┘ │ ├──────────────────────────┼──────────────────────────────────────┤ │ LAYER 2: ENGINE │ │ ┌───────────────────────┼──────────────────────────────────┐ │ │ │ ┌────────────┐ ┌────┴─────┐ ┌──────────┐ ┌───────┐ │ │ │ │ │ Tree-Sitter│ │ Markdown │ │ Workflow │ │ Graph │ │ │ │ │ │ Extractor │ │ Parser │ │ Engine │ │ Query │ │ │ │ │ │ per lang │ │ struct │ │ tpl │ │ Planner│ │ │ │ │ └─────┬──────┘ └────┬─────┘ └────┬─────┘ └───┬───┘ │ │ │ │ └──────────────┼──────────────┼────────────┘ │ │ │ │ ┌───────┴──────────────┴───────┐ │ │ │ │ │ Graph Mutation Engine │ │ │ │ │ │ diff, merge, validate │ │ │ │ │ └───────────────┬───────────────┘ │ │ │ └──────────────────────────────┼────────────────────────────┘ │ ├──────────────────────────────┼─────────────────────────────────┤ │ LAYER 1: STORAGE │ │ ┌───────────────────────────┼─────────────────────────────┐ │ │ │ ┌────────────┴────────────┐ │ │ │ │ │ Storage Abstraction │ │ │ │ │ │ IStore interface │ │ │ │ │ └────────────┬────────────┘ │ │ │ │ ┌──────────────────┼──────────────────┐ │ │ │ │ ┌─────┴──────┐ ┌─────┴──────┐ ┌─────┴──────┐ │ │ │ │ │ SurrealDB │ │ SQLite │ │ In-Memory │ │ │ │ │ │ primary │ │ fallback │ │ tests │ │ │ │ │ └────────────┘ └────────────┘ └────────────┘ │ │ │ └──────────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ SIDE CHANNELS │ │ ┌──────────────┐ ┌───────────────┐ ┌────────────────────┐ │ │ │ Git Hooks │ │ File Watcher │ │ CLI tokenzip │ │ │ │ pre-commit │ │ optional │ │ init, parse, query │ │ │ │ post-commit │ │ chokidar │ │ status, serve │ │ │ └──────────────┘ └───────────────┘ └────────────────────┘ │ └─────────────────────────────────────────────────────────────────┘ 2. Component Design 2.1 Tree-Sitter Extractor ┌─────────────────────┐ │ File Input Stream │ └──────────┬──────────┘ │ ┌──────────┴──────────┐ │ Language Detector │ │ extension + shebang│ │ + .editorconfig │ └──────────┬──────────┘ │ ┌────────────────┼────────────────┐ │ │ │ ┌────────┴──────┐ ┌──────┴──────┐ ┌──────┴──────┐ │ Code Extractor│ │ SQL Extract.│ │ MD Extractor│ │ JS/TS/Py/Go │ │ Tables, │ │ Sections, │ │ /Rs/Java/Kt │ │ Columns, │ │ Mermaid, │ │ │ │ FKs, SPs │ │ Lists, │ │ │ │ │ │ Tables │ └───────┬───────┘ └──────┬──────┘ └──────┬──────┘ │ │ │ └────────────────┼────────────────┘ │ ┌─────────┴──────────┐ │ Symbol Graph │ │ nodes + edges │ └────────────────────┘ Key Design Decision: Extractors produce an intermediate representation IR — a flat list of SymbolNode and SymbolEdge objects — regardless of source language. This decouples parsing from storage. 2.2 Chainable Query Builder CQB QueryBuilder ├── .repo path → RepoScope │ ├── .modules → ModuleScope │ │ ├── .files → FileScope │ │ │ ├── .symbols → SymbolScope │ │ │ ├── .tables → TableScope │ │ │ └── .sections → SectionScope │ │ ├── .dependencies → ModuleScope external deps │ │ └── .dependants → ModuleScope │ ├── .files → FileScope all files, no module filter │ ├── .symbols → SymbolScope global search │ ├── .tables → TableScope │ └── .docs → DocScope ├── .symbol name → SymbolScope direct lookup ├── .table name → TableScope ├── .commit hash → CommitScope └── .workflow name → WorkflowScope Every Scope has: ├── .filter predicate → same Scope adds WHERE clause ├── .sort field, dir → same Scope ├── .limit n → same Scope ├── .offset n → same Scope └── Terminal methods: ├── .toArray → SymbolNode ├── .toGraph → { nodes: , edges: } ├── .toMarkdown → string ├── .toJSON → string ├── .count → number └── .exists → boolean 2.3 MCP Server Architecture ┌─────────────────────────────────────────────┐ │ MCP Server │ │ │ │ ┌─────────────────────────────────────┐ │ │ │ Transport Layer │ │ │ │ ┌──────────┐ ┌───────────────┐ │ │ │ │ │ stdio │ │ SSE/HTTP │ │ │ │ │ │ default │ │ optional │ │ │ │ │ └────┬─────┘ └──────┬────────┘ │ │ │ └───────┼──────────────────┼───────────┘ │ │ └──────────┬───────┘ │ │ ┌─────┴──────┐ │ │ │ Protocol │ │ │ │ Handler │ │ │ └─────┬──────┘ │ │ │ │ │ ┌─────────────────┼─────────────────────┐ │ │ │ Tool Dispatcher │ │ │ │ ┌──────────┐ ┌──────────┐ ┌────────┐ │ │ │ │ │ Structure│ │ Search │ │ Impact │ │ │ │ │ │ Tools │ │ Tools │ │ Tools │ │ │ │ │ └────┬─────┘ └────┬─────┘ └───┬────┘ │ │ │ │ └─────────────┼───────────┘ │ │ │ │ ┌──────┴──────┐ │ │ │ │ │ CQB │ │ │ │ │ │ shared │ │ │ │ │ └──────┬──────┘ │ │ │ └─────────────────────┼──────────────────┘ │ │ │ │ │ ┌─────────────────────┼──────────────────┐ │ │ │ Token Budget Manager │ │ │ │ - Estimates response token count │ │ │ │ - Truncates if over budget │ │ │ │ - Prioritizes: symbols files mods │ │ │ └─────────────────────────────────────────┘ │ └─────────────────────────────────────────────┘ 2.4 Git Hook Pipeline pre-commit trigger │ ▼ ┌──────────────────┐ │ git diff --cached │ │ --name-only │ └───────┬──────────┘ │ staged file paths ▼ ┌──────────────────┐ │ Content Hash │ ← SHA256 of file content │ Check │ ← Compare with stored hash └───────┬──────────┘ │ changed files only ▼ ┌──────────────────┐ │ Tree-Sitter │ ← Parallel parse worker threads │ Batch Parse │ └───────┬──────────┘ │ new symbol IR ▼ ┌──────────────────┐ │ Graph Diff │ ← Old symbols vs new symbols │ & Merge │ ← Update nodes, edges, hashes └───────┬──────────┘ │ ▼ ┌──────────────────┐ │ Validation │ ← Check: broken exports, orphan imports, │ optional │ missing type references └───────┬──────────┘ │ ┌────┴────┐ │ │ ▼ ▼ PASS FAIL │ │ ▼ ▼ Continue Warn/Block Commit configurable 3. Data Model Graph Schema 3.1 Node Types ┌─────────────────────────────────────────────────────────────────┐ │ NODE: repository │ │ id: string record ID │ │ name: string │ │ root: string absolute path │ │ created at: datetime │ │ updated at: datetime │ │ stats: { files: number, modules: number, symbols: number } │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ NODE: module │ │ id: string │ │ name: string │ │ path: string relative to repo root │ │ manifest type: string package.json | pyproject.toml | ... │ │ language: string primary language │ │ is root: bool │ │ metadata: { name, version, description, ... } │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ NODE: file │ │ id: string │ │ path: string relative to repo root │ │ module id: string reference to module │ │ language: string │ │ ext: string │ │ size bytes: number │ │ content hash: string SHA256 │ │ line count: number │ │ parse status: string parsed | partial | failed | skipped │ │ parse error: option<string │ │ last parsed: datetime │ │ git last modified: option<datetime │ │ git blame summary: option<{ author, date, commit count } │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ NODE: symbol polymorphic by kind │ │ id: string │ │ file id: string │ │ name: string │ │ kind: enum { │ │ function, method, constructor, │ │ class, interface, type alias, enum, │ │ variable, constant, property, │ │ parameter, generic param, │ │ decorator, annotation, │ │ table, view, column, index, constraint, │ │ foreign key, stored procedure, │ │ import, export, re export, │ │ namespace, module decl, │ │ section, subsection, │ │ workflow step, diagram node, │ │ list item, table row │ │ } │ │ signature: option<string full signature text │ │ return type: option<string │ │ start line: number │ │ end line: number │ │ start col: number │ │ end col: number │ │ docstring: option<string │ │ is exported: bool │ │ is async: option<bool │ │ is static: option<bool │ │ visibility: option<enum { public, private, protected } │ │ modifiers: array<string │ │ parent symbol id: option<string for nested symbols │ │ metadata: object language-specific extras │ │ // For tables: { schema, engine, columns: ... } │ │ // For classes: { implements: ... , extends: ... } │ │ // For functions: { params: ... , generics: ... } │ │ // For sections: { level, anchor id } │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ NODE: commit │ │ id: string │ │ hash: string full SHA │ │ short hash: string 7 char │ │ message: string │ │ author: string │ │ email: string │ │ date: datetime │ │ branch: string │ │ tags: array<string │ └─────────────────────────────────────────────────────────────────┘ ┌─────────────────────────────────────────────────────────────────┐ │ NODE: dependency external │ │ id: string │ │ module id: string which module depends on it │ │ name: string npm package name, pip package, etc. │ │ version: string resolved version │ │ dev: bool │ │ source: string npm, pip, cargo, go modules, maven │ └─────────────────────────────────────────────────────────────────┘ 3.2 Edge Types EDGE: contains FROM: repository → TO: module FROM: module → TO: file FROM: file → TO: symbol FROM: symbol → TO: symbol nested: class → method EDGE: imports FROM: file → TO: file file-level import FROM: module → TO: module module-level dependency FROM: symbol → TO: symbol symbol-level import METADATA: { is type only: bool, is default: bool, alias: option<string } EDGE: exports FROM: file → TO: symbol FROM: symbol → TO: symbol re-export chain METADATA: { is default: bool, is reexport: bool, alias: option<string } EDGE: calls FROM: symbol function/method → TO: symbol function/method METADATA: { line: number, is async: bool, call type: enum { direct, indirect, dynamic } } EDGE: implements FROM: symbol class → TO: symbol interface METADATA: { is partial: bool } EDGE: inherits FROM: symbol class/interface → TO: symbol class/interface METADATA: { is interface inheritance: bool } EDGE: modifies FROM: symbol function → TO: symbol variable/table/column EDGE: reads FROM: symbol function → TO: symbol variable/table/column EDGE: references FROM: symbol → TO: symbol generic "uses" relationship METADATA: { context: string } EDGE: depends on FROM: module → TO: module transitive closure of imports FROM: file → TO: file METADATA: { is transitive: bool, depth: number } EDGE: depended by computed reverse of depends on EDGE: modified in FROM: file → TO: commit METADATA: { change type: enum { added, modified, deleted, renamed } } EDGE: authored by FROM: file/symbol → TO: commit latest commit touching this artifact EDGE: belongs to workflow FROM: symbol → TO: symbol workflow step EDGE: workflow transition FROM: symbol workflow step → TO: symbol workflow step METADATA: { condition: option<string , action: option<string } EDGE: diagram edge FROM: symbol diagram node → TO: symbol diagram node METADATA: { label: string, style: string, type: enum { solid, dashed, dotted, bold } } EDGE: foreign key FROM: symbol column → TO: symbol table METADATA: { constraint name: string, on delete: string, on update: string } EDGE: column of FROM: symbol column/index/constraint → TO: symbol table 3.3 Indexes DEFINE INDEX idx file path ON file FIELDS path UNIQUE DEFINE INDEX idx file hash ON file FIELDS content hash DEFINE INDEX idx file module ON file FIELDS module id DEFINE INDEX idx symbol name ON symbol FIELDS name DEFINE INDEX idx symbol kind ON symbol FIELDS kind DEFINE INDEX idx symbol file ON symbol FIELDS file id DEFINE INDEX idx symbol export ON symbol FIELDS is exported DEFINE INDEX idx module path ON module FIELDS path UNIQUE DEFINE INDEX idx commit hash ON commit FIELDS hash UNIQUE DEFINE INDEX idx dep name ON dependency FIELDS name, module id 4. Technology Stack | Component | Technology | Rationale | |---|---|---| | Runtime | Node.js 20+ ESM | Universal, tree-sitter bindings available, MCP SDK native | | Tree-Sitter | tree-sitter + language grammars | Industry standard, incremental parsing, multi-language | | Graph DB | SurrealDB v2 embedded/RocksDB | Native graph queries, schemaful, embedded mode, no server | | Fallback DB | better-sqlite3 | Zero-config fallback if SurrealDB unavailable | | MCP | @modelcontextprotocol/sdk | Official SDK, stdio + SSE transport | | CLI | commander | Battle-tested CLI framework | | Git | simple-git | Promise-based git operations | | File Watch | chokidar | Cross-platform, efficient | | Logging | pino | Structured, fast | | Testing | vitest + memfs | Fast, in-memory FS for unit tests | | Bundling | tsup | ESM + CJS dual output, tree-shaking | | Markdown | unified + remark + rehype | Pluggable markdown AST pipeline | | Mermaid | mermaid headless | Parse mermaid diagrams to structured data | 5. Integration Architecture 5.1 MCP Integration Points Claude Code / Codex / OpenCode │ │ MCP Protocol JSON-RPC 2.0 over stdio │ ┌────┴─────┐ │ MCP │ │ Server │ └────┬─────┘ │ ┌────┴──────────────────────────────────┐ │ Tool Calls │ │ │ │ 1. query repo structure │ │ → Returns module tree + stats │ │ │ │ 2. query symbol { name, scope } │ │ → Symbol node + edges │ │ │ │ 3. get impact analysis { symbol id } │ │ → Dependents + transitive closure │ │ │ │ 4. search symbols { query, filters } │ │ → Fuzzy match on name/signature │ │ │ │ 5. get workflow { doc, section } │ │ → Structured workflow + links │ │ │ │ 6. get git history { path, limit } │ │ → Commit chain for file/symbol │ │ │ │ 7. execute workflow template { │ │ type, params } │ │ → Structured analysis result │ │ │ │ 8. get dependencies { module id } │ │ → Internal + external deps │ │ │ │ 9. get dependants { symbol id } │ │ → Reverse dependency chain │ │ │ │ 10. get context for files { │ │ paths, max tokens } │ │ → Token-budget-aware context │ │ │ └───────────────────────────────────────┘ 5.2 Claude Code MCP Config auto-generated json { "mcpServers": { "tokenzip": { "command": "npx", "args": "tokenzip", "serve", "--cwd", "/path/to/project" , "env": {} } } } 6. Security Considerations - No network : All data stays local. SurrealDB binds to 127.0.0.1 only if HTTP transport used. - No code execution : Graph stores metadata only. No eval, no require from stored data. - Path traversal protection : All file paths resolved and canonicalized before storage. - Git hook safety : Hooks are read-only from git's perspective never force-push, never amend . - .tokenzip/ in .gitignore : Automatically appended, never committed. - Token budget : MCP responses capped at configurable token limit to prevent context overflow. 7. Deployment Model Local Developer Machine │ ├── ~/.tokenzip/ │ ├── config.json Global config │ ├── surrealdb/ Shared SurrealDB binary if not system-installed │ └── cache/ Cross-project cache │ └── <project-root / ├── .tokenzip/ │ ├── db/ SurrealDB data directory │ │ ├── data.db RocksDB storage │ │ └── lock Process lock │ ├── config.json Project-specific config │ │ ├── languages: ... │ │ ├── excluded: ... │ │ ├── hooks: { preCommit: "warn" | "block" | "off" } │ │ └── mcp: { maxTokens: 8000, transport: "stdio" } │ └── state.json Parse state, last commit, version │ ├── .git/ │ └── hooks/ │ ├── pre-commit Appended tokenzip hook │ └── post-commit Appended tokenzip hook │ └── .gitignore Contains .tokenzip/ --- 🔧 LLD — Low-Level Design 1. Module Structure tokenzip/ ├── src/ │ ├── index.ts Public API entry point │ │ │ ├── cli/ CLI layer │ │ ├── index.ts Commander setup │ │ ├── commands/ │ │ │ ├── init.ts tokenzip init │ │ │ ├── parse.ts tokenzip parse --full | --incremental │ │ │ ├── query.ts tokenzip query <cqb-expression │ │ │ ├── status.ts tokenzip status │ │ │ ├── serve.ts tokenzip serve --transport stdio|sse --port 3000 │ │ │ ├── hooks.ts tokenzip hooks install|uninstall │ │ │ └── clean.ts tokenzip clean │ │ └── utils/ │ │ └── spinner.ts │ │ │ ├── mcp/ MCP server layer │ │ ├── server.ts MCP server creation & setup │ │ ├── transport/ │ │ │ ├── stdio.ts │ │ │ └── sse.ts │ │ ├── tools/ │ │ │ ├── registry.ts Tool registration │ │ │ ├── structure.ts query repo structure, query module │ │ │ ├── symbol.ts query symbol, search symbols │ │ │ ├── dependency.ts get dependencies, get dependants │ │ │ ├── impact.ts get impact analysis │ │ │ ├── git.ts get git history │ │ │ ├── workflow.ts get workflow, execute workflow template │ │ │ └── context.ts get context for files │ │ ├── resources/ │ │ │ ├── registry.ts │ │ │ ├── repo.ts │ │ │ ├── module.ts │ │ │ ├── file.ts │ │ │ └── symbol.ts │ │ └── token-budget.ts Token estimation & truncation │ │ │ ├── query/ Chainable Query Builder │ │ ├── builder.ts Base QueryBuilder class │ │ ├── scopes/ │ │ │ ├── repo-scope.ts │ │ │ ├── module-scope.ts │ │ │ ├── file-scope.ts │ │ │ ├── symbol-scope.ts │ │ │ ├── table-scope.ts │ │ │ ├── commit-scope.ts │ │ │ ├── doc-scope.ts │ │ │ └── workflow-scope.ts │ │ ├── filters.ts Filter predicate parser │ │ ├── translators/ │ │ │ ├── surrealql.ts CQB → SurrealQL translation │ │ │ └── sql.ts CQB → SQL translation SQLite fallback │ │ └── types.ts │ │ │ ├── engine/ Core engine layer │ │ ├── indexer.ts Full & incremental indexing orchestrator │ │ ├── differ.ts Graph diff: old symbols vs new symbols │ │ ├── merger.ts Merge diff into graph │ │ ├── validator.ts Reference integrity validation │ │ ├── module-detector.ts Detect module boundaries │ │ └── language-detector.ts Detect language from extension + content │ │ │ ├── extractor/ Tree-sitter extraction layer │ │ ├── base-extractor.ts Abstract extractor interface │ │ ├── registry.ts Language → extractor mapping │ │ ├── code/ │ │ │ ├── javascript.ts JS/JSX extractor │ │ │ ├── typescript.ts TS/TSX extractor │ │ │ ├── python.ts │ │ │ ├── go.ts │ │ │ ├── rust.ts │ │ │ ├── java.ts │ │ │ └── kotlin.ts │ │ ├── sql/ │ │ │ └── sql.ts SQL extractor tables, columns, FKs │ │ ├── markdown/ │ │ │ ├── markdown.ts Markdown structure extractor │ │ │ ├── mermaid.ts Mermaid diagram parser │ │ │ └── sections.ts Section type classifier │ │ └── types.ts SymbolIR, EdgeIR types │ │ │ ├── storage/ Storage abstraction layer │ │ ├── interface.ts IStore interface │ │ ├── surreal/ │ │ │ ├── connection.ts Connection pool & lifecycle │ │ │ ├── migrations.ts Schema migration │ │ │ ├── queries/ │ │ │ │ ├── nodes.ts │ │ │ │ ├── edges.ts │ │ │ │ ├── graph.ts │ │ │ │ └── search.ts │ │ │ └── store.ts SurrealStore implements IStore │ │ ├── sqlite/ │ │ │ ├── schema.ts Table creation │ │ │ ├── queries/ │ │ │ │ ├── nodes.ts │ │ │ │ ├── edges.ts │ │ │ │ └── graph.ts │ │ │ └── store.ts SQLiteStore implements IStore │ │ ├── memory/ │ │ │ └── store.ts MemoryStore for testing │ │ └── factory.ts StoreFactory: config → IStore │ │ │ ├── hooks/ Git hook layer │ │ ├── installer.ts Install hooks into .git/hooks/ │ │ ├── pre-commit.ts Pre-commit logic │ │ ├── post-commit.ts Post-commit logic │ │ └── detector.ts Detect staged files │ │ │ ├── workflows/ Workflow template engine │ │ ├── engine.ts Workflow executor │ │ ├── registry.ts Workflow template registry │ │ └── templates/ │ │ ├── create-module.ts │ │ ├── update-module.ts │ │ ├── implement-feature.ts │ │ ├── upgrade-feature.ts │ │ └── bug-fix.ts │ │ │ ├── utils/ │ │ ├── logger.ts │ │ ├── hash.ts Content hashing SHA256 │ │ ├── path.ts Path resolution & normalization │ │ ├── tokens.ts Token estimation chars/4 for code │ │ ├── workers.ts Worker thread pool for parsing │ │ └── version.ts │ │ │ └── types/ │ ├── graph.ts All node & edge types │ ├── extractor.ts Extractor IR types │ ├── query.ts Query builder types │ └── config.ts Configuration types │ ├── grammars/ Tree-sitter WASM grammars bundled │ ├── tree-sitter-javascript.wasm │ ├── tree-sitter-typescript.wasm │ ├── tree-sitter-python.wasm │ ├── tree-sitter-go.wasm │ ├── tree-sitter-rust.wasm │ ├── tree-sitter-java.wasm │ ├── tree-sitter-kotlin.wasm │ └── tree-sitter-sql.wasm │ ├── tests/ │ ├── unit/ │ │ ├── extractor/ │ │ │ ├── javascript.test.ts │ │ │ ├── typescript.test.ts │ │ │ ├── python.test.ts │ │ │ ├── sql.test.ts │ │ │ └── markdown.test.ts │ │ ├── query/ │ │ │ └── builder.test.ts │ │ ├── engine/ │ │ │ ├── differ.test.ts │ │ │ ├── merger.test.ts │ │ │ └── module-detector.test.ts │ │ ├── storage/ │ │ │ └── memory-store.test.ts │ │ └── hooks/ │ │ └── detector.test.ts │ ├── integration/ │ │ ├── full-parse.test.ts │ │ ├── incremental-parse.test.ts │ │ ├── mcp-server.test.ts │ │ └── git-hook.test.ts │ ├── fixtures/ │ │ ├── js-project/ │ │ ├── ts-monorepo/ │ │ ├── python-project/ │ │ ├── sql-project/ │ │ └── mixed-project/ │ └── e2e/ │ └── claude-code.test.ts │ ├── package.json ├── tsconfig.json ├── tsup.config.ts └── vitest.config.ts 2. Detailed Component Design 2.1 Storage Abstraction IStore typescript // src/storage/interface.ts import type { RepositoryNode, ModuleNode, FileNode, SymbolNode, CommitNode, DependencyNode, ContainsEdge, ImportsEdge, ExportsEdge, CallsEdge, ImplementsEdge, InheritsEdge, ModifiesEdge, ReadsEdge, ReferencesEdge, DependsOnEdge, ModifiedInEdge, ForeignKeyEdge, ColumnOfEdge, // ... all edge types } from '../types/graph'; export interface GraphNode { id: string; type: 'repository' | 'module' | 'file' | 'symbol' | 'commit' | 'dependency'; key: string : unknown; } export interface GraphEdge { id: string; type: string; from: string; to: string; key: string : unknown; } export interface GraphResult { nodes: GraphNode ; edges: GraphEdge ; } export interface StoreStats { nodeCount: Record<string, number ; edgeCount: Record<string, number ; dbSizeBytes: number; } export interface IStore { // Lifecycle initialize : Promise<void ; close : Promise<void ; migrate : Promise<void ; clear : Promise<void ; stats : Promise<StoreStats ; // Node CRUD createNode<T extends GraphNode node: T : Promise<T ; createNodes<T extends GraphNode nodes: T : Promise<T ; getNode<T extends GraphNode id: string : Promise<T | null ; getNodes ids: string : Promise<GraphNode ; updateNode<T extends GraphNode id: string, patch: Partial<T : Promise<T ; deleteNode id: string : Promise<void ; deleteNodes ids: string : Promise<void ; // Edge CRUD createEdge<T extends GraphEdge edge: T : Promise<T ; createEdges<T extends GraphEdge edges: T : Promise<T ; getEdges from: string, type?: string : Promise<GraphEdge ; getEdgesTo to: string, type?: string : Promise<GraphEdge ; deleteEdges from: string, type?: string : Promise<void ; // Graph Queries query surrealQL: string, vars?: Record<string, unknown : Promise<unknown ; graphTraversal startId: string, edgeTypes: string , direction: 'outbound' | 'inbound' | 'both', depth?: number, filter?: string : Promise<GraphResult ; // Bulk Operations batchUpsert nodes: GraphNode , edges: GraphEdge : Promise<void ; // Search searchNodes type: string, field: string, query: string, limit?: number : Promise<GraphNode ; // Transactions transaction<T fn: store: IStore = Promise<T : Promise<T ; } 2.2 Tree-Sitter Extractor Interface typescript // src/extractor/base-extractor.ts import { Parser, Tree } from 'tree-sitter'; import { SymbolIR, EdgeIR } from './types'; export interface ExtractionResult { symbols: SymbolIR ; edges: EdgeIR ; parseErrors: ParseError ; } export interface ParseError { line: number; column: number; message: string; } export interface ExtractorContext { filePath: string; relativePath: string; content: string; contentHash: string; tree: Tree; language: string; moduleId: string; } export abstract class BaseExtractor { abstract readonly language: string; abstract readonly extensions: string ; / Extract symbols and edges from a parsed tree. Called after tree-sitter has parsed the file. / abstract extract ctx: ExtractorContext : ExtractionResult; / Post-process extraction results. Resolve internal references, compute derived edges. Default implementation does nothing; subclasses can override. / postProcess symbols: SymbolIR , edges: EdgeIR , ctx: ExtractorContext : { symbols: SymbolIR ; edges: EdgeIR } { return { symbols, edges }; } / Generate a stable ID for a symbol. Must be deterministic for the same symbol in the same file. / generateSymbolId filePath: string, symbolName: string, kind: string, startLine: number : string { // Format: sym:<filepath-hash :<name :<kind :<line const pathHash = this.hashPath filePath ; return sym:${pathHash}:${symbolName}:${kind}:${startLine} ; } private hashPath filePath: string : string { // First 8 chars of SHA256 of relative path return createHash 'sha256' .update filePath .digest 'hex' .slice 0, 8 ; } / Walk the tree-sitter AST with a visitor pattern. Utility method for subclasses. / protected walk node: Parser.SyntaxNode, visitors: Record<string, node: Parser.SyntaxNode = void : void { const visitor = visitors node.type ; if visitor { visitor node ; } for let i = 0; i < node.childCount; i++ { this.walk node.child i , visitors ; } } / Extract docstring/JSDoc/comment attached to a node. / protected extractDocstring node: Parser.SyntaxNode, content: string : string | null { // Look for preceding comment nodes const prev = node.previousNamedSibling; if prev && prev.type === 'comment' || prev.type === 'block comment' || prev.type === 'docstring' || prev.type === 'jsdoc' { return content.slice prev.startIndex, prev.endIndex .trim ; } return null; } } 2.3 TypeScript Extractor Detailed Example typescript // src/extractor/code/typescript.ts import { BaseExtractor, ExtractorContext, ExtractionResult, SymbolIR, EdgeIR } from '../base-extractor'; export class TypeScriptExtractor extends BaseExtractor { language = 'typescript'; extensions = '.ts', '.tsx', '.mts', '.cts' ; extract ctx: ExtractorContext : ExtractionResult { const symbols: SymbolIR = ; const edges: EdgeIR = ; const parseErrors: ParseError = ; // Collect parse errors this.collectErrors ctx.tree.rootNode, parseErrors, ctx.content ; // Visit top-level and nested declarations this.walk ctx.tree.rootNode, { // Functions 'function declaration': node = { const name = this.getName node ; if name return; symbols.push { id: this.generateSymbolId ctx.relativePath, name, 'function', node.startPosition.row + 1 , fileId: file:${ctx.relativePath} , name, kind: 'function', signature: this.getSignature node, ctx.content , returnType: this.getReturnType node , startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractDocstring node, ctx.content , isExported: this.isExported node , isAsync: this.hasModifier node, 'async' , isStatic: false, visibility: this.getVisibility node , modifiers: this.getModifiers node , metadata: { params: this.extractParams node, ctx.content , generics: this.extractGenerics node, ctx.content , typeParams: this.extractTypeParams node , }, } ; }, // Arrow functions assigned to variables 'variable declaration': node = { const declarator = node.childForFieldName 'declarator' ; if declarator return; const value = declarator.childForFieldName 'value' ; if value || value.type == 'arrow function' && value.type == 'function expression' return; const name = this.getName declarator ; if name return; const funcKind = value.type === 'arrow function' ? 'function' : 'function'; symbols.push { id: this.generateSymbolId ctx.relativePath, name, funcKind, node.startPosition.row + 1 , fileId: file:${ctx.relativePath} , name, kind: funcKind, signature: this.getSignature value, ctx.content , returnType: this.getReturnType value , startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractDocstring node, ctx.content , isExported: this.isExported node , isAsync: this.hasModifier value, 'async' , isStatic: false, visibility: this.getVisibility node , modifiers: this.getModifiers node , metadata: { isArrow: value.type === 'arrow function', params: this.extractParams value, ctx.content , generics: this.extractGenerics value, ctx.content , }, } ; }, // Classes 'class declaration': node = { const name = this.getName node ; if name return; const heritage = this.extractHeritage node ; // extends, implements const symbolId = this.generateSymbolId ctx.relativePath, name, 'class', node.startPosition.row + 1 ; symbols.push { id: symbolId, fileId: file:${ctx.relativePath} , name, kind: 'class', signature: this.getSignature node, ctx.content , startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractDocstring node, ctx.content , isExported: this.isExported node , isStatic: false, visibility: this.getVisibility node , modifiers: this.getModifiers node , metadata: { extends: heritage.extends, implements: heritage.implements, generics: this.extractGenerics node, ctx.content , }, } ; // Create inheritance edges if heritage.extends { edges.push { type: 'inherits', from: symbolId, to: sym:unknown:${heritage.extends}:class:0 , // resolved later metadata: { is interface inheritance: false }, isResolved: false, } ; } for const impl of heritage.implements { edges.push { type: 'implements', from: symbolId, to: sym:unknown:${impl}:interface:0 , metadata: { is partial: false }, isResolved: false, } ; } }, // Interfaces 'interface declaration': node = { const name = this.getName node ; if name return; const extendsList = this.extractInterfaceExtends node ; const symbolId = this.generateSymbolId ctx.relativePath, name, 'interface', node.startPosition.row + 1 ; symbols.push { id: symbolId, fileId: file:${ctx.relativePath} , name, kind: 'interface', signature: this.getSignature node, ctx.content , startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractDocstring node, ctx.content , isExported: this.isExported node , isStatic: false, visibility: 'public', modifiers: this.getModifiers node , metadata: { extends: extendsList, generics: this.extractGenerics node, ctx.content , members: this.extractInterfaceMembers node, ctx.content, ctx.relativePath , }, } ; for const ext of extendsList { edges.push { type: 'inherits', from: symbolId, to: sym:unknown:${ext}:interface:0 , metadata: { is interface inheritance: true }, isResolved: false, } ; } }, // Type aliases 'type alias declaration': node = { const name = this.getName node ; if name return; symbols.push { id: this.generateSymbolId ctx.relativePath, name, 'type alias', node.startPosition.row + 1 , fileId: file:${ctx.relativePath} , name, kind: 'type alias', signature: this.getTypeAliasBody node, ctx.content , startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractDocstring node, ctx.content , isExported: this.isExported node , isStatic: false, visibility: 'public', modifiers: , metadata: { generics: this.extractGenerics node, ctx.content , }, } ; }, // Enums 'enum declaration': node = { const name = this.getName node ; if name return; const members = this.extractEnumMembers node, ctx.content ; symbols.push { id: this.generateSymbolId ctx.relativePath, name, 'enum', node.startPosition.row + 1 , fileId: file:${ctx.relativePath} , name, kind: 'enum', startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractDocstring node, ctx.content , isExported: this.isExported node , isStatic: false, visibility: 'public', modifiers: this.getModifiers node , metadata: { members }, } ; }, // Imports file-level 'import statement': node = { const importInfo = this.extractImport node, ctx.content ; if importInfo return; // Store as symbol for tracking symbols.push { id: this.generateSymbolId ctx.relativePath, importInfo.source, 'import', node.startPosition.row + 1 , fileId: file:${ctx.relativePath} , name: importInfo.source, kind: 'import', startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, isExported: false, modifiers: , metadata: { source: importInfo.source, specifiers: importInfo.specifiers, isTypeOnly: importInfo.isTypeOnly, isDefault: importInfo.isDefault, }, } ; // Create import edge edges.push { type: 'imports', from: file:${ctx.relativePath} , to: file:${this.resolveImportPath ctx.relativePath, importInfo.source } , metadata: { is type only: importInfo.isTypeOnly, is default: importInfo.isDefault, specifiers: importInfo.specifiers, }, isResolved: false, } ; }, // Export statements 'export statement': node = { // Handle: export { foo, bar } from './module' const exportInfo = this.extractReExport node, ctx.content ; if exportInfo { for const spec of exportInfo.specifiers { edges.push { type: 'exports', from: file:${ctx.relativePath} , to: file:${this.resolveImportPath ctx.relativePath, exportInfo.source } , metadata: { is reexport: true, is default: spec.isDefault, alias: spec.alias, name: spec.name, }, isResolved: false, } ; } } }, // Method definitions inside classes 'method definition': node = { // This is handled inside class declaration visitor // We capture it there for parent symbol id linking }, // Property definitions inside classes 'public field definition': node = { // Handled inside class declaration }, } ; // Post-process: resolve parent symbol id for nested symbols // Post-process: mark exported symbols const processed = this.postProcess symbols, edges, ctx ; return { symbols: processed.symbols, edges: processed.edges, parseErrors, }; } // ... helper methods getName, getSignature, extractParams, etc. // Each is ~10-20 lines using tree-sitter child navigation } 2.4 SQL Extractor typescript // src/extractor/sql/sql.ts export class SQLExtractor extends BaseExtractor { language = 'sql'; extensions = '.sql' ; extract ctx: ExtractorContext : ExtractionResult { const symbols: SymbolIR = ; const edges: EdgeIR = ; const parseErrors: ParseError = ; this.walk ctx.tree.rootNode, { 'create table': node = { const tableName = this.getTableName node ; if tableName return; const tableId = this.generateSymbolId ctx.relativePath, tableName, 'table', node.startPosition.row + 1 ; // Extract columns const columns = this.extractColumns node, ctx.content, ctx.relativePath, tableId ; const constraints = this.extractConstraints node, ctx.content, ctx.relativePath, tableId ; const indexes = this.extractIndexes node, ctx.content, ctx.relativePath, tableId ; symbols.push { id: tableId, fileId: file:${ctx.relativePath} , name: tableName, kind: 'table', signature: this.getTableSignature node, ctx.content , startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractTableComment node, ctx.content , isExported: false, modifiers: , metadata: { schema: this.getSchemaName node , engine: this.getEngine node , columns: columns.map c = c.name , columnCount: columns.length, }, } ; symbols.push ...columns, ...constraints, ...indexes ; // Create column of edges for const col of columns { edges.push { type: 'column of', from: col.id, to: tableId } ; } for const idx of indexes { edges.push { type: 'column of', from: idx.id, to: tableId } ; } for const con of constraints { edges.push { type: 'column of', from: con.id, to: tableId } ; } // Extract foreign keys and create FK edges const fks = this.extractForeignKeys node, ctx.content ; for const fk of fks { const fromColId = this.generateSymbolId ctx.relativePath, fk.column, 'column', 0 // approximate ; const toTableId = sym:unknown:${fk.refTable}:table:0 ; edges.push { type: 'foreign key', from: fromColId, to: toTableId, metadata: { constraint name: fk.name, on delete: fk.onDelete, on update: fk.onUpdate, ref column: fk.refColumn, }, isResolved: false, } ; } }, 'create view': node = { const viewName = this.getViewName node ; if viewName return; symbols.push { id: this.generateSymbolId ctx.relativePath, viewName, 'view', node.startPosition.row + 1 , fileId: file:${ctx.relativePath} , name: viewName, kind: 'view', signature: this.getViewQuery node, ctx.content , startLine: node.startPosition.row + 1, endLine: node.endPosition.row + 1, startCol: node.startPosition.column, endCol: node.endPosition.column, docstring: this.extractViewComment node, ctx.content , isExported: false, modifiers: , metadata: { schema: this.getSchemaName node }, } ; }, 'create procedure': node = { // Stored procedures / functions }, } ; return { symbols, edges, parseErrors }; } } 2.5 Markdown Extractor typescript // src/extractor/markdown/markdown.ts import { unified } from 'unified'; import remarkParse from 'remark-parse'; import remarkGfm from 'remark-gfm'; import { visit } from 'unist-util-visit'; import { Root, Heading, Code, List, Table, ListItem } from 'mdast'; export class MarkdownExtractor extends BaseExtractor { language = 'markdown'; extensions = '.md', '.mdx', '.markdown' ; extract ctx: ExtractorContext : ExtractionResult { const symbols: SymbolIR = ; const edges: EdgeIR = ; const tree = unified .use remarkParse .use remarkGfm .parse ctx.content as Root; let currentSection: string | null = null; let sectionCounter = 0; let workflowStepCounter = 0; let diagramNodeCounter = 0; visit tree, node = { // Headings → sections if node.type === 'heading' { const heading = node as Heading; const text = this.getTextContent heading ; const level = heading.depth; const sectionId = this.generateSymbolId ctx.relativePath, text, 'section', heading.position?.start.line || 0 ; const sectionSymbol: SymbolIR = { id: sectionId, fileId: file:${ctx.relativePath} , name: text, kind: 'section', startLine: heading.position?.start.line || 0, endLine: heading.position?.end.line || 0, startCol: heading.position?.start.column || 0, endCol: heading.position?.end.column || 0, isExported: false, modifiers: , metadata: { level, anchor id: this.slugify text , section type: this.classifySection text , }, }; symbols.push sectionSymbol ; // Link to parent section if currentSection && level 1 { edges.push { type: 'contains', from: currentSection, to: sectionId, } ; } currentSection = sectionId; sectionCounter++; } // Code blocks → check for mermaid if node.type === 'code' { const code = node as Code; if code.lang === 'mermaid' && code.value { const diagramResult = this.parseMermaid code.value, ctx ; symbols.push ...diagramResult.symbols ; edges.push ...diagramResult.edges ; // Link diagram to current section if currentSection { for const sym of diagramResult.symbols { edges.push { type: 'contains', from: currentSection, to: sym.id } ; } } } } // Lists → structured list items if node.type === 'list' { const list = node as List; this.extractListItems list, symbols, edges, ctx, currentSection ; } // Tables → structured rows if node.type === 'table' { const table = node as Table; const tableResult = this.extractTable table, ctx, currentSection ; symbols.push ...tableResult.symbols ; edges.push ...tableResult.edges ; } } ; return { symbols, edges, parseErrors: }; } private classifySection heading: string : string { const lower = heading.toLowerCase ; if /workflow|flow|process|pipeline/.test lower return 'workflow'; if /sequence\s diagram/.test lower return 'sequence diagram'; if /flowchart/.test lower return 'flowchart'; if /release\s plan|roadmap|timeline/.test lower return 'release plan'; if /api|endpoint/.test lower return 'api'; if /architecture|component|system\s design/.test lower return 'architecture'; if /decision|adr/.test lower return 'decision'; if /requirement|user\s story|acceptance/.test lower return 'requirement'; return 'general'; } private parseMermaid mermaidCode: string, ctx: ExtractorContext : { symbols: SymbolIR ; edges: EdgeIR } { const symbols: SymbolIR = ; const edges: EdgeIR = ; // Detect diagram type const typeMatch = mermaidCode.match /^ sequenceDiagram|flowchart\s+\w+|stateDiagram|erDiagram|classDiagram|gantt /m ; const diagramType = typeMatch?. 1 || 'unknown'; if diagramType === 'sequenceDiagram' { return this.parseSequenceDiagram mermaidCode, ctx ; } if diagramType.startsWith 'flowchart' { return this.parseFlowchart mermaidCode, ctx ; } if diagramType === 'erDiagram' { return this.parseERDiagram mermaidCode, ctx ; } if diagramType === 'classDiagram' { return this.parseClassDiagram mermaidCode, ctx ; } // Fallback: store as raw diagram node symbols.push { id: this.generateSymbolId ctx.relativePath, diagram-${Date.now } , 'section', 0 , fileId: file:${ctx.relativePath} , name: Mermaid ${diagramType} , kind: 'section', startLine: 0, endLine: 0, startCol: 0, endCol: 0, isExported: false, modifiers: , metadata: { diagram type: diagramType, raw: mermaidCode }, } ; return { symbols, edges }; } private parseSequenceDiagram code: string, ctx: ExtractorContext : { symbols: SymbolIR ; edges: EdgeIR } { // Parse: // participant A as Actor A // A- B: Message // B-- A: Response // // Creates: diagram node per participant // Creates: diagram edge per message with label, style const symbols: SymbolIR = ; const edges: EdgeIR = ; const participants = new Map<string, string ; // alias → full name const baseLine = 0; // Would need actual line from parent const participantRe = /^participant\s+ \w+ ?:\s+as\s+ .+ ?$/gm; let match; while match = participantRe.exec code == null { const alias = match 1 ; const fullName = match 2 || alias; participants.set alias, fullName ; const id = this.generateSymbolId ctx.relativePath, alias, 'diagram node', baseLine ; symbols.push { id, fileId: file:${ctx.relativePath} , name: fullName, kind: 'diagram node', startLine: baseLine, endLine: baseLine, startCol: 0, endCol: 0, isExported: false, modifiers: , metadata: { diagram type: 'sequence diagram', role: 'participant', alias, }, } ; } // Parse messages: A- B: text or A-- B: text const msgRe = /^ \w+ - |-- |- |-- \s \w+ :\s .+ $/gm; let msgMatch; let msgCounter = 0; while msgMatch = msgRe.exec code == null { const fromAlias = msgMatch 1 ; const arrowStyle = msgMatch 2 ; const toAlias = msgMatch 3 ; const message = msgMatch 4 ; const fromId = this.generateSymbolId ctx.relativePath, fromAlias, 'diagram node', baseLine ; const toId = this.generateSymbolId ctx.relativePath, toAlias, 'diagram node', baseLine ; // Register participants if not explicitly declared if participants.has fromAlias { participants.set fromAlias, fromAlias ; symbols.push { id: fromId, fileId: file:${ctx.relativePath} , name: fromAlias, kind: 'diagram node', startLine: baseLine, endLine: baseLine, startCol: 0, endCol: 0, isExported: false, modifiers: , metadata: { diagram type: 'sequence diagram', role: 'participant', alias: fromAlias }, } ; } if participants.has toAlias { participants.set toAlias, toAlias ; symbols.push { id: toId, fileId: file:${ctx.relativePath} , name: toAlias, kind: 'diagram node', startLine: baseLine, endLine: baseLine, startCol: 0, endCol: 0, isExported: false, modifiers: , metadata: { diagram type: 'sequence diagram', role: 'participant', alias: toAlias }, } ; } edges.push { type: 'diagram edge', from: fromId, to: toId, metadata: { label: message, style: arrowStyle === '- ' ? 'solid' : arrowStyle === '-- ' ? 'dashed' : 'dotted', type: 'solid', sequence: msgCounter++, is response: arrowStyle.includes '--' , }, } ; } return { symbols, edges }; } // ... parseFlowchart, parseERDiagram, parseClassDiagram, extractListItems, extractTable } 2.6 Chainable Query Builder — Core typescript // src/query/builder.ts import { IStore } from '../storage/interface'; import { GraphNode, GraphEdge, GraphResult } from '../types/graph'; import { RepoScope } from './scopes/repo-scope'; export type SortDirection = 'asc' | 'desc'; export type TerminalFormat = 'array' | 'graph' | 'markdown' | 'json'; export interface FilterPredicate { field: string; op: 'eq' | 'neq' | 'gt' | 'gte' | 'lt' | 'lte' | 'contains' | 'matches' | 'in' | 'exists'; value: unknown; } export abstract class QueryScope<T extends QueryScope<T { protected filters: FilterPredicate = ; protected sortField: string | null = null; protected sortDir: SortDirection = 'asc'; protected limitCount: number | null = null; protected offsetCount: number = 0; constructor protected store: IStore, protected repoPath: string {} filter predicate: FilterPredicate | item: GraphNode = boolean : T { const clone = this.clone ; if typeof predicate === 'function' { // Function filters are applied post-hoc for in-memory operations clone.filters.push { field: ' func', op: 'eq', value: predicate } as any ; } else { clone.filters.push predicate ; } return clone as T; } // Shorthand filters eq field: string, value: unknown : T { return this.filter { field, op: 'eq', value } ; } neq field: string, value: unknown : T { return this.filter { field, op: 'neq', value } ; } contains field: string, value: string : T { return this.filter { field, op: 'contains', value } ; } matches field: string, pattern: string : T { return this.filter { field, op: 'matches', value: pattern } ; } in field: string, values: unknown : T { return this.filter { field, op: 'in', value: values } ; } sort field: string, dir: SortDirection = 'asc' : T { const clone = this.clone ; clone.sortField = field; clone.sortDir = dir; return clone as T; } limit n: number : T { const clone = this.clone ; clone.limitCount = n; return clone as T; } offset n: number : T { const clone = this.clone ; clone.offsetCount = n; return clone as T; } // Terminal methods async toArray : Promise<GraphNode { const result = await this.execute ; return this.applyPostFilters result.nodes as GraphNode ; } async toGraph : Promise<GraphResult { const result = await this.execute ; return { nodes: this.applyPostFilters result.nodes as GraphNode , edges: result.edges as GraphEdge , }; } async toMarkdown : Promise<string { const nodes = await this.toArray ; return this.formatAsMarkdown nodes ; } async toJSON : Promise<string { const result = await this.toGraph ; return JSON.stringify result, null, 2 ; } async count : Promise<number { const nodes = await this.toArray ; return nodes.length; } async exists : Promise<boolean { const count = await this.count ; return count 0; } // Abstract: each scope implements its own query translation protected abstract execute : Promise<{ nodes: unknown ; edges: unknown } ; protected abstract clone : T; protected abstract formatAsMarkdown nodes: GraphNode : string; protected applyPostFilters nodes: GraphNode : GraphNode { return nodes.filter node = { for const f of this.filters { if f.field === ' func' continue; // Skip function filters for DB const val = node as any f.field ; if this.evaluateFilter val, f return false; } // Apply function filters for const f of this.filters { if f.field === ' func' { if f.value as Function node return false; } } return true; } ; } private evaluateFilter val: unknown, f: FilterPredicate : boolean { switch f.op { case 'eq': return val === f.value; case 'neq': return val == f.value; case 'contains': return typeof val === 'string' && val.includes f.value as string ; case 'matches': return typeof val === 'string' && new RegExp f.value as string .test val ; case 'in': return Array.isArray f.value && f.value.includes val ; case 'exists': return val == null && val == undefined; case 'gt': return typeof val === 'number' && val f.value as number ; case 'gte': return typeof val === 'number' && val = f.value as number ; case 'lt': return typeof val === 'number' && val < f.value as number ; case 'lte': return typeof val === 'number' && val <= f.value as number ; default: return true; } } } // Public API entry point export function createQuery store: IStore, repoPath: string : RepoScope { return new RepoScope store, repoPath ; } 2.7 RepoScope Top-Level typescript // src/query/scopes/repo-scope.ts import { QueryScope } from '../builder'; import { IStore } from '../../storage/interface'; import { GraphNode } from '../../types/graph'; import { ModuleScope } from './module-scope'; import { FileScope } from './file-scope'; import { SymbolScope } from './symbol-scope'; export class RepoScope extends QueryScope<RepoScope { protected async execute : Promise<{ nodes: unknown ; edges: unknown } { const query = SELECT FROM repository WHERE root = $repoPath LIMIT 1 ; const nodes = await this.store.query query, { repoPath: this.repoPath } ; return { nodes, edges: }; } protected clone : RepoScope { return new RepoScope this.store, this.repoPath ; } protected formatAsMarkdown nodes: GraphNode : string { if nodes.length === 0 return 'Repository not indexed.'; const repo = nodes 0 ; const stats = repo.stats as any; return Repository: ${repo.name} , , - Path: ${repo.root} , - Files: ${stats?.files ?? 'N/A'} , - Modules: ${stats?.modules ?? 'N/A'} , - Symbols: ${stats?.symbols ?? 'N/A'} , - Last Indexed: ${repo.updated at} , .join '\n' ; } // Navigation to sub-scopes modules : ModuleScope { return new ModuleScope this.store, this.repoPath, null ; } files : FileScope { return new FileScope this.store, this.repoPath, null ; } symbols : SymbolScope { return new SymbolScope this.store, this.repoPath, null ; } docs : DocScope { return new DocScope this.store, this.repoPath, null ; } // Convenience: direct symbol lookup symbol name: string : SymbolScope { return new SymbolScope this.store, this.repoPath, null .eq 'name', name ; } table name: string : TableScope { return new TableScope this.store, this.repoPath, null .eq 'name', name ; } commit hash: string : CommitScope { return new CommitScope this.store, this.repoPath, null .eq 'hash', hash ; } } 2.8 SymbolScope With Graph Traversal typescript // src/query/scopes/symbol-scope.ts import { QueryScope } from '../builder'; import { IStore } from '../../storage/interface'; import { GraphNode, GraphEdge } from '../../types/graph'; export class SymbolScope extends QueryScope<SymbolScope { constructor store: IStore, repoPath: string, private moduleId: string | null { super store, repoPath ; } protected async execute : Promise<{ nodes: unknown ; edges: unknown } { let query = 'SELECT FROM symbol'; const vars: Record<string, unknown = {}; const conditions: string = ; if this.moduleId { // Join through file to filter by module query = SELECT symbol. , file.path as file path, file.module id FROM symbol INNER JOIN file ON symbol.file id = file.id ; conditions.push 'file.module id = $moduleId' ; vars.moduleId = this.moduleId; } // Apply filters for const f of this.filters { if f.field === ' func' continue; const param = f ${f.field} ; switch f.op { case 'eq': conditions.push symbol.${f.field} = $${param} ; break; case 'neq': conditions.push symbol.${f.field} = $${param} ; break; case 'contains': conditions.push string::contains symbol.${f.field}, $${param} ; break; case 'matches': conditions.push string::matches symbol.${f.field}, $${param} ; break; case 'in': conditions.push symbol.${f.field} IN $${param} ; break; case 'exists': conditions.push symbol.${f.field} = NONE ; break; } vars param = f.value; } if conditions.length 0 { query += WHERE ${conditions.join ' AND ' } ; } if this.sortField { query += ORDER BY symbol.${this.sortField} ${this.sortDir.toUpperCase } ; } if this.limitCount == null { query += LIMIT ${this.limitCount} ; } if this.offsetCount 0 { query += START ${this.offsetCount} ; } const nodes = await this.store.query query, vars ; return { nodes, edges: }; } // Graph traversal methods async dependants : Promise<SymbolScope { const symbols = await this.toArray ; if symbols.length === 0 return this; const ids = symbols.map s = s.id ; const result = await this.store.graphTraversal ids 0 , // Start from first symbol 'calls', 'imports', 'references' , 'inbound', 10, // max depth undefined ; // Return new scope with traversed nodes const newScope = new SymbolScope this.store, this.repoPath, this.moduleId ; // Store pre-computed result newScope as any . precomputedNodes = result.nodes; newScope as any . precomputedEdges = result.edges; return newScope; } async dependencies : Promise<SymbolScope { const symbols = await this.toArray ; if symbols.length === 0 return this; const result = await this.store.graphTraversal symbols 0 .id, 'calls', 'imports', 'references' , 'outbound', 10, undefined ; const newScope = new SymbolScope this.store, this.repoPath, this.moduleId ; newScope as any . precomputedNodes = result.nodes; newScope as any . precomputedEdges = result.edges; return newScope; } async callers : Promise<SymbolScope { const symbols = await this.toArray ; if symbols.length === 0 return this; const result = await this.store.graphTraversal symbols 0 .id, 'calls' , 'inbound', 10, undefined ; const newScope = new SymbolScope this.store, this.repoPath, this.moduleId ; newScope as any . precomputedNodes = result.nodes; newScope as any . precomputedEdges = result.edges; return newScope; } async callees : Promise<SymbolScope { const symbols = await this.toArray ; if symbols.length === 0 return this; const result = await this.store.graphTraversal symbols 0 .id, 'calls' , 'outbound', 10, undefined ; const newScope = new SymbolScope this.store, this.repoPath, this.moduleId ; newScope as any . precomputedNodes = result.nodes; newScope as any . precomputedEdges = result.edges; return newScope; } // Navigate to containing file async file : Promise<FileScope { const symbols = await this.toArray ; if symbols.length === 0 return new FileScope this.store, this.repoPath, null ; const fileId = symbols 0 as any .file id; const fileScope = new FileScope this.store, this.repoPath, null ; fileScope as any . precomputedFileId = fileId; return fileScope; } protected clone : SymbolScope { return new SymbolScope this.store, this.repoPath, this.moduleId ; } protected formatAsMarkdown nodes: GraphNode : string { if nodes.length === 0 return 'No symbols found.'; return nodes.map n = { const s = n as any; const exportTag = s.is exported ? 'exported' : 'internal'; const location = s.file path ? ${s.file path}:${s.start line} : ${s.start line} ; return - ${s.name} ${s.kind} ${exportTag} ${location}${s.signature ? \n \ ${s.signature}\ : ''}${s.docstring ? \n ${s.docstring.split '\n' 0 } : ''} ; } .join '\n' ; } } 2.9 MCP Tool Implementation Example typescript // src/mcp/tools/impact.ts import { Tool } from '@modelcontextprotocol/sdk/types.js'; import { IStore } from '../../storage/interface'; import { createQuery } from '../../query/builder'; import { TokenBudgetManager } from '../token-budget'; export function createImpactAnalysisTool store: IStore, repoPath: string, budget: TokenBudgetManager : Tool { return { name: 'get impact analysis', description: Analyze the impact of changing a symbol. Returns all direct and transitive dependants — functions that call it, files that import it, modules that depend on it. Use this before making changes to understand blast radius. , inputSchema: { type: 'object', properties: { symbol name: { type: 'string', description: 'Name of the symbol to analyze', }, symbol kind: { type: 'string', enum: 'function', 'class', 'interface', 'type alias', 'variable', 'table', 'column' , description: 'Kind of symbol optional, narrows search ', }, file path: { type: 'string', description: 'File path to disambiguate optional ', }, max depth: { type: 'number', description: 'Max traversal depth for transitive dependants default: 5 ', default: 5, }, include transitive: { type: 'boolean', description: 'Include transitive indirect dependants default: true ', default: true, }, }, required: 'symbol name' , }, handler: async params: any = { const q = createQuery store, repoPath .symbol params.symbol name ; if params.symbol kind q.eq 'kind', params.symbol kind ; if params.file path q.eq 'file path', params.file path ; const symbols = await q.toArray ; if symbols.length === 0 { return { content: { type: 'text', text: JSON.stringify { error: 'Symbol not found', symbol name: params.symbol name } } , }; } const symbol = symbols 0 ; const depth = params.max depth ?? 5; // Get dependants via graph traversal const result = await store.graphTraversal symbol.id, 'calls', 'imports', 'references', 'implements' , 'inbound', depth, undefined ; // Organize by distance direct vs transitive const direct = result.edges.filter e = { // Direct edges are those where the target is our symbol return e.to === symbol.id; } .map e = result.nodes.find n = n.id === e.from .filter Boolean ; const transitive = result.nodes.filter n = n.id == symbol.id && direct.find d = d.id === n.id ; // Group by file and module const byFile = new Map<string, GraphNode ; const byModule = new Map<string, GraphNode ; for const node of result.nodes { const n = node as any; if n.file path { if byFile.has n.file path byFile.set n.file path, ; byFile.get n.file path .push node ; } if n.module id { if byModule.has n.module id byModule.set n.module id, ; byModule.get n.module id .push node ; } } const response = { target: { id: symbol.id, name: symbol as any .name, kind: symbol as any .kind, file: symbol as any .file path, line: symbol as any .start line, }, impact summary: { total dependants: result.nodes.length, direct dependants: direct.length, transitive dependants: transitive.length, files affected: byFile.size, modules affected: byModule.size, }, direct dependants: direct.map n = { name: n as any .name, kind: n as any .kind, file: n as any .file path, line: n as any .start line, relationship: result.edges.find e = e.from === n.id && e.to === symbol.id ?.type, } , affected files: Object.fromEntries Array.from byFile.entries .map path, nodes = path, nodes.map n = { name: n as any .name, kind: n as any .kind, line: n as any .start line } , affected modules: Object.fromEntries Array.from byModule.entries .map id, nodes = id, { symbol count: nodes.length, kinds: ...new Set nodes.map n = n as any .kind } , token estimate: budget.estimate JSON.stringify result , }; // Apply token budget truncation if needed const truncated = budget.truncate response, params.max tokens ; return { content: { type: 'text', text: JSON.stringify truncated, null, 2 } , }; }, }; } 2.10 Git Hook Implementation typescript // src/hooks/pre-commit.ts import { simpleGit, SimpleGit } from 'simple-git'; import { IStore } from '../storage/interface'; import { ExtractorRegistry } from '../extractor/registry'; import { GraphDiffer } from '../engine/differ'; import { GraphMerger } from '../engine/merger'; import { Validator } from '../engine/validator'; import { contentHash } from '../utils/hash'; import { Logger } from '../utils/logger'; interface PreCommitResult { status: 'pass' | 'warn' | 'fail'; parsed: number; updated: number; added: number; removed: number; errors: string ; warnings: string ; } export async function runPreCommit repoPath: string, store: IStore, config: { mode: 'warn' | 'block' | 'off' }, logger: Logger : Promise<PreCommitResult { const result: PreCommitResult = { status: 'pass', parsed: 0, updated: 0, added: 0, removed: 0, errors: , warnings: , }; const git: SimpleGit = simpleGit repoPath ; // 1. Get staged files const stagedFiles = await git.diff '--cached', '--name-only', '--diff-filter=ACMR' ; const fileNames = stagedFiles.trim .split '\n' .filter Boolean ; if fileNames.length === 0 { return result; } logger.info Pre-commit: ${fileNames.length} staged files ; // 2. Filter to supported files const registry = new ExtractorRegistry ; const supportedFiles = fileNames.filter f = registry.supportsFile f ; if supportedFiles.length === 0 { return result; } logger.info Pre-commit: ${supportedFiles.length} supported files to parse ; // 3. Parse changed files for const filePath of supportedFiles { try { const absolutePath = path.resolve repoPath, filePath ; const content = await fs.readFile absolutePath, 'utf-8' ; const hash = contentHash content ; // Check if content actually changed const existingFile = await store.query 'SELECT content hash FROM file WHERE path = $path LIMIT 1', { path: filePath } ; if existingFile.length 0 && existingFile 0 .content hash === hash { continue; // No change } // Extract symbols const extractor = registry.getExtractor filePath ; const extraction = await extractor.extractFile absolutePath, repoPath ; // Diff against existing graph const oldSymbols = await store.query 'SELECT FROM symbol WHERE file id = $fileId', { fileId: file:${filePath} } ; const diff = GraphDiffer.diff oldSymbols, extraction.symbols ; // Merge into graph await store.transaction async tx = { // Remove old symbols for const removed of diff.removed { await tx.deleteNode removed.id ; await tx.deleteEdges removed.id ; result.removed++; } // Update changed symbols for const changed of diff.changed { await tx.updateNode changed.new.id, changed.new ; result.updated++; } // Add new symbols for const added of diff.added { await tx.createNode added ; result.added++; } // Update edges await tx.deleteEdges file:${filePath} ; // Remove old edges from this file await tx.createEdges extraction.edges.map e = { ...e, // Resolve file-level edges from: e.from.startsWith 'file:' ? file:${filePath} : e.from, } ; // Update file node const fileNode = { id: file:${filePath} , type: 'file', path: filePath, content hash: hash, parse status: extraction.parseErrors.length === 0 ? 'parsed' : 'partial', parse error: extraction.parseErrors.length 0 ? extraction.parseErrors.map e = L${e.line}: ${e.message} .join '; ' : null, last parsed: new Date .toISOString , line count: content.split '\n' .length, size bytes: Buffer.byteLength content , }; await tx.createNode fileNode as any ; } ; result.parsed++; if extraction.parseErrors.length 0 { result.warnings.push ${filePath}: ${extraction.parseErrors.length} parse errors ; } } catch err { result.errors.push ${filePath}: ${err.message} ; logger.error Pre-commit error for ${filePath} , err ; } } // 4. Validate if enabled if config.mode == 'off' { const validation = await Validator.validate store, repoPath ; result.warnings.push ...validation.warnings ; result.errors.push ...validation.errors ; if result.errors.length 0 && config.mode === 'block' { result.status = 'fail'; } else if result.warnings.length 0 || result.errors.length 0 { result.status = 'warn'; } } // 5. Update repo stats await updateRepoStats store, repoPath ; return result; } 2.11 Workflow Template: Bug Fix typescript // src/workflows/templates/bug-fix.ts import { IStore } from '../../storage/interface'; import { createQuery } from '../../query/builder'; export interface BugFixInput { error message?: string; stack trace?: string ; file path?: string; line number?: number; symbol name?: string; error type?: string; // TypeError, ReferenceError, etc. } export interface BugFixOutput { root candidates: RootCandidate ; impact radius: ImpactRadius; related tests: RelatedTest ; recent changes: RecentChange ; suggested investigation order: string ; } interface RootCandidate { symbol id: string; symbol name: string; kind: string; file path: string; line: number; confidence: 'high' | 'medium' | 'low'; reason: string; } interface ImpactRadius { direct callers: number; transitive callers: number; affected files: string ; affected modules: string ; } export async function executeBugFixWorkflow store: IStore, repoPath: string, input: BugFixInput : Promise<BugFixOutput { const candidates: RootCandidate = ; // Strategy 1: If we have a file + line, look up the symbol at that location if input.file path && input.line number { const symbols = await createQuery store, repoPath .symbol '' // We need a different query here .eq 'file path', input.file path .toArray ; // Find symbol containing the line const containing = symbols.find s = { const sym = s as any; return sym.start line <= input.line number && sym.end line = input.line number ; } ; if containing { candidates.push { symbol id: containing.id, symbol name: containing as any .name, kind: containing as any .kind, file path: containing as any .file path, line: containing as any .start line, confidence: 'high', reason: Symbol at error location ${input.file path}:${input.line number} , } ; } } // Strategy 2: If we have a symbol name from the error e.g., "Cannot read property 'foo' of undefined" if input.symbol name || input.error message { const nameToSearch = input.symbol name || extractPropertyName input.error message ; if nameToSearch { const matches = await createQuery store, repoPath .symbol nameToSearch .toArray ; for const match of matches { // Don't duplicate if already found if candidates.find c = c.symbol id === match.id continue; candidates.push { symbol id: match.id, symbol name: match as any .name, kind: match as any .kind, file path: match as any .file path, line: match as any .start line, confidence: 'medium', reason: Name matches error reference: "${nameToSearch}" , } ; } } } // Strategy 3: If we have a stack trace, trace the call chain if input.stack trace && input.stack trace.length 0 { for const frame of input.stack trace { const parsed = parseStackFrame frame ; if parsed continue; const symbols = await createQuery store, repoPath .symbol parsed.functionName .eq 'file path', parsed.filePath .toArray ; for const sym of symbols { if candidates.find c = c.symbol id === sym.id continue; candidates.push { symbol id: sym.id, symbol name: sym as any .name, kind: sym as any .kind, file path: sym as any .file path, line: sym as any .start line, confidence: parsed.filePath === input.file path ? 'high' : 'medium', reason: Appears in stack trace: ${frame.trim } , } ; } } } // Strategy 4: If error type suggests null/undefined, find recently changed symbols in the area if input.error type && 'TypeError', 'ReferenceError' .includes input.error type { // Find symbols modified in last 5 commits in the same file if input.file path { const recentSymbols = await store.query SELECT symbol. , commit.hash, commit.date FROM symbol INNER JOIN modified in ON symbol.file id = modified in.from INNER JOIN commit ON modified in.to = commit.id WHERE symbol.file path = $filePath ORDER BY commit.date DESC LIMIT 10 , { filePath: input.file path } ; for const rs of recentSymbols { if candidates.find c = c.symbol id === rs.id continue; candidates.push { symbol id: rs.id, symbol name: rs.name, kind: rs.kind, file path: rs.file path, line: rs.start line, confidence: 'low', reason: Recently modified symbol in error file commit ${rs.hash} , } ; } } } // Compute impact radius for top candidate let impactRadius: ImpactRadius = { direct callers: 0, transitive callers: 0, affected files: , affected modules: , }; if candidates.length 0 { const topCandidate = candidates 0 ; const result = await store.graphTraversal topCandidate.symbol id, 'calls', 'imports' , 'inbound', 10, undefined ; const directEdges = result.edges.filter e = e.to === topCandidate.symbol id ; impactRadius.direct callers = directEdges.length; impactRadius.transitive callers = result.nodes.length; impactRadius.affected files = ...new Set result.nodes.map n = n as any .file path .filter Boolean ; // Resolve modules for const filePath of impactRadius.affected files { const fileNode = await store.query 'SELECT module id FROM file WHERE path = $path LIMIT 1', { path: filePath } ; if fileNode.length 0 && fileNode 0 .module id { impactRadius.affected modules.push fileNode 0 .module id ; } } impactRadius.affected modules = ...new Set impactRadius.affected modules ; } // Find related tests const relatedTests: RelatedTest = ; if candidates.length 0 { for const candidate of candidates.slice 0, 3 { const testSymbols = await store.query SELECT FROM symbol WHERE name CONTAINS $testName AND kind = 'function' AND name LIKE '%test%' LIMIT 5 , { testName: candidate.symbol name } ; for const test of testSymbols { relatedTests.push { test name: test.name, file path: test.file path, line: test.start line, linked to: candidate.symbol name, } ; } } } // Suggest investigation order const suggestedOrder = candidates .sort a, b = { const confOrder = { high: 0, medium: 1, low: 2 }; return confOrder a.confidence - confOrder b.confidence ; } .map c = ${c.file path}:${c.line} ${c.symbol name} ; return { root candidates: candidates, impact radius: impactRadius, related tests: relatedTests, recent changes: , // Populated from git log suggested investigation order: suggestedOrder, }; } function extractPropertyName errorMessage: string : string | null { // "Cannot read properties of undefined reading 'foo' " const readMatch = errorMessage.match /reading ' \w+ '/ ; if readMatch return readMatch 1 ; // "foo is not a function" const notFnMatch = errorMessage.match / \w+ is not a function/ ; if notFnMatch return notFnMatch 1 ; // "foo is not defined" const notDefMatch = errorMessage.match / \w+ is not defined/ ; if notDefMatch return notDefMatch 1 ; return null; } function parseStackFrame frame: string : { functionName: string; filePath: string } | null { // "at functionName /path/to/file.ts:10:5 " const match = frame.match /at\s+ \w+ \s+\ .+ : \d+ :\d+\ / ; if match return null; return { functionName: match 1 , filePath: match 2 }; } 2.12 Token Budget Manager typescript // src/mcp/token-budget.ts export class TokenBudgetManager { private maxTokens: number; // Approximate tokens per character for different content types private static RATES = { code: 0.25, // ~4 chars per token markdown: 0.3, // ~3.3 chars per token json: 0.22, // ~4.5 chars per token compact text: 0.33, // ~3 chars per token }; constructor maxTokens: number = 8000 { this.maxTokens = maxTokens; } estimate content: string, type: keyof typeof TokenBudgetManager.RATES = 'json' : number { return Math.ceil content.length TokenBudgetManager.RATES type ; } truncate<T data: T, requestedMax?: number : T & { truncated: boolean; token count: number } { const max = requestedMax ?? this.maxTokens; const json = JSON.stringify data ; const tokens = this.estimate json ; if tokens <= max { return { ...data, truncated: false, token count: tokens, } as T & { truncated: boolean; token count: number }; } // Truncation strategy: keep structure, reduce detail const truncated = this.smartTruncate data, max ; const truncatedJson = JSON.stringify truncated ; const truncatedTokens = this.estimate truncatedJson ; return { ...truncated, truncated: true, token count: truncatedTokens, } as T & { truncated: boolean; token count: number }; } private smartTruncate<T data: T, budget: number : T { const obj = data as any; // Strategy 1: If it has an array of items, truncate the array for const key of Object.keys obj { if Array.isArray obj key && obj key .length 0 { // Keep reducing until we're under budget let len = obj key .length; while len 1 { const testObj = { ...obj, key : obj key .slice 0, len }; const testJson = JSON.stringify testObj ; if this.estimate testJson <= budget 0.9 { // 10% margin for metadata obj key = obj key .slice 0, len ; obj. truncation note = ${key} truncated from ${obj key .length} to ${len} items ; return obj as T; } len = Math.floor len 0.7 ; // Reduce by 30% each iteration } obj key = obj key .slice 0, 1 ; return obj as T; } } // Strategy 2: Remove verbose fields const verboseFields = 'signature', 'docstring', 'metadata', 'raw' ; for const field of verboseFields { if obj field { delete obj field ; const testJson = JSON.stringify obj ; if this.estimate testJson <= budget 0.9 { return obj as T; } } } // Strategy 3: Last resort - truncate string fields for const key of Object.keys obj { if typeof obj key === 'string' && obj key .length 100 { obj key = obj key .slice 0, 100 + '...'; } } return obj as T; } } 2.13 SurrealDB Schema Migration typescript // src/storage/surreal/migrations.ts export const SCHEMA DEFINITION = // ============================================ // TOKENZIP GRAPH SCHEMA - SurrealDB v2 // ============================================ // --- NODE TYPES --- DEFINE TABLE repository SCHEMAFULL; DEFINE FIELD name ON repository TYPE string; DEFINE FIELD root ON repository TYPE string; DEFINE FIELD created at ON repository TYPE datetime DEFAULT time::now ; DEFINE FIELD updated at ON repository TYPE datetime DEFAULT time::now ; DEFINE FIELD stats ON repository TYPE object { files: number, modules: number, symbols: number }; DEFINE TABLE module SCHEMAFULL; DEFINE FIELD name ON module TYPE string; DEFINE FIELD path ON module TYPE string; DEFINE FIELD manifest type ON module TYPE string; DEFINE FIELD language ON module TYPE string; DEFINE FIELD is root ON module TYPE bool DEFAULT false; DEFINE FIELD metadata ON module TYPE object; DEFINE FIELD repository id ON module TYPE record<repository ; DEFINE TABLE file SCHEMAFULL; DEFINE FIELD path ON file TYPE string; DEFINE FIELD module id ON file TYPE record<module ; DEFINE FIELD language ON file TYPE string; DEFINE FIELD ext ON file TYPE string; DEFINE FIELD size bytes ON file TYPE int; DEFINE FIELD content hash ON file TYPE string; DEFINE FIELD line count ON file TYPE int; DEFINE FIELD parse status ON file TYPE string ASSERT $value IN 'parsed', 'partial', 'failed', 'skipped' ; DEFINE FIELD parse error ON file TYPE option<string ; DEFINE FIELD last parsed ON file TYPE datetime; DEFINE FIELD git last modified ON file TYPE option<datetime ; DEFINE FIELD git blame summary ON file TYPE option<object ; DEFINE TABLE symbol SCHEMAFULL; DEFINE FIELD file id ON symbol TYPE record<file ; DEFINE FIELD name ON symbol TYPE string; DEFINE FIELD kind ON symbol TYPE string ASSERT $value IN 'function', 'method', 'constructor', 'class', 'interface', 'type alias', 'enum', 'variable', 'constant', 'property', 'parameter', 'generic param', 'decorator', 'annotation', 'table', 'view', 'column', 'index', 'constraint', 'foreign key', 'stored procedure', 'import', 'export', 're export', 'namespace', 'module decl', 'section', 'subsection', 'workflow step', 'diagram node', 'list item', 'table row' ; DEFINE FIELD signature ON symbol TYPE option<string ; DEFINE FIELD return type ON symbol TYPE option<string ; DEFINE FIELD start line ON symbol TYPE int; DEFINE FIELD end line ON symbol TYPE int; DEFINE FIELD start col ON symbol TYPE int; DEFINE FIELD end col ON symbol TYPE int; DEFINE FIELD docstring ON symbol TYPE option<string ; DEFINE FIELD is exported ON symbol TYPE bool DEFAULT false; DEFINE FIELD is async ON symbol TYPE option<bool ; DEFINE FIELD is static ON symbol TYPE option<bool ; DEFINE FIELD visibility ON symbol TYPE option<string ASSERT $value IN null, 'public', 'private', 'protected' ; DEFINE FIELD modifiers ON symbol TYPE array; DEFINE FIELD parent symbol id ON symbol TYPE option<string ; DEFINE FIELD metadata ON symbol TYPE object; DEFINE TABLE commit SCHEMAFULL; DEFINE FIELD hash ON commit TYPE string; DEFINE FIELD short hash ON commit TYPE string; DEFINE FIELD message ON commit TYPE string; DEFINE FIELD author ON commit TYPE string; DEFINE FIELD email ON commit TYPE string; DEFINE FIELD date ON commit TYPE datetime; DEFINE FIELD branch ON commit TYPE string; DEFINE FIELD tags ON commit TYPE array; DEFINE TABLE dependency SCHEMAFULL; DEFINE FIELD module id ON dependency TYPE record<module ; DEFINE FIELD name ON dependency TYPE string; DEFINE FIELD version ON dependency TYPE string; DEFINE FIELD dev ON dependency TYPE bool DEFAULT false; DEFINE FIELD source ON dependency TYPE string; // --- EDGE TYPES --- DEFINE TABLE contains SCHEMAFULL TYPE RELATION FROM repository, module, file, symbol TO module, file, symbol; DEFINE TABLE imports SCHEMAFULL TYPE RELATION FROM file, symbol, module TO file, symbol, module; DEFINE FIELD is type only ON imports TYPE option<bool ; DEFINE FIELD is default ON imports TYPE option<bool ; DEFINE FIELD alias ON imports TYPE option<string ; DEFINE FIELD specifiers ON imports TYPE option<array ; DEFINE TABLE exports SCHEMAFULL TYPE RELATION FROM file, symbol TO symbol, file; DEFINE FIELD is default ON exports TYPE option<bool ; DEFINE FIELD is reexport ON exports TYPE option<bool ; DEFINE FIELD alias ON exports TYPE option<string ; DEFINE FIELD name ON exports TYPE option<string ; DEFINE TABLE calls SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE FIELD line ON calls TYPE option<int ; DEFINE FIELD is async ON calls TYPE option<bool ; DEFINE FIELD call type ON calls TYPE option<string ASSERT $value IN null, 'direct', 'indirect', 'dynamic' ; DEFINE TABLE implements SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE FIELD is partial ON implements TYPE option<bool ; DEFINE TABLE inherits SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE FIELD is interface inheritance ON inherits TYPE option<bool ; DEFINE TABLE modifies SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE TABLE reads SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE TABLE references SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE FIELD context ON references TYPE option<string ; DEFINE TABLE depends on SCHEMAFULL TYPE RELATION FROM module, file TO module, file; DEFINE FIELD is transitive ON depends on TYPE option<bool ; DEFINE FIELD depth ON depends on TYPE option<int ; DEFINE TABLE modified in SCHEMAFULL TYPE RELATION FROM file TO commit; DEFINE FIELD change type ON modified in TYPE string ASSERT $value IN 'added', 'modified', 'deleted', 'renamed' ; DEFINE TABLE foreign key SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE FIELD constraint name ON foreign key TYPE option<string ; DEFINE FIELD on delete ON foreign key TYPE option<string ; DEFINE FIELD on update ON foreign key TYPE option<string ; DEFINE FIELD ref column ON foreign key TYPE option<string ; DEFINE TABLE column of SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE TABLE diagram edge SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE FIELD label ON diagram edge TYPE option<string ; DEFINE FIELD style ON diagram edge TYPE option<string ; DEFINE FIELD type ON diagram edge TYPE option<string ; DEFINE FIELD sequence ON diagram edge TYPE option<int ; DEFINE FIELD is response ON diagram edge TYPE option<bool ; DEFINE TABLE workflow transition SCHEMAFULL TYPE RELATION FROM symbol TO symbol; DEFINE FIELD condition ON workflow transition TYPE option<string ; DEFINE FIELD action ON workflow transition TYPE option<string ; // --- INDEXES --- DEFINE INDEX idx file path ON file FIELDS path UNIQUE; DEFINE INDEX idx file hash ON file FIELDS content hash; DEFINE INDEX idx file module ON file FIELDS module id; DEFINE INDEX idx symbol name ON symbol FIELDS name; DEFINE INDEX idx symbol kind ON symbol FIELDS kind; DEFINE INDEX idx symbol file ON symbol FIELDS file id; DEFINE INDEX idx symbol export ON symbol FIELDS is exported; DEFINE INDEX idx module path ON module FIELDS path UNIQUE; DEFINE INDEX idx commit hash ON commit FIELDS hash UNIQUE; DEFINE INDEX idx dep name ON dependency FIELDS name, module id; ; 2.14 Error Handling Strategy typescript // src/utils/errors.ts export class TokenZipError extends Error { constructor message: string, public readonly code: ErrorCode, public readonly details?: Record<string, unknown { super message ; this.name = 'TokenZipError'; } } export enum ErrorCode { // Storage errors 1xxx DB CONNECTION FAILED = 'E1001', DB QUERY FAILED = 'E1002', DB MIGRATION FAILED = 'E1003', DB CORRUPTED = 'E1004', // Parser errors 2xxx PARSE FAILED = 'E2001', GRAMMAR NOT FOUND = 'E2002', PARTIAL PARSE = 'E2003', // Git errors 3xxx GIT NOT REPOSITORY = 'E3001', GIT HOOK INSTALL FAILED = 'E3002', GIT DIFF FAILED = 'E3003', // MCP errors 4xxx MCP TRANSPORT FAILED = 'E4001', MCP TOOL NOT FOUND = 'E4002', MCP INVALID PARAMS = 'E4003', MCP TOKEN BUDGET EXCEEDED = 'E4004', // Config errors 5xxx CONFIG NOT FOUND = 'E5001', CONFIG INVALID = 'E5002', // Indexer errors 6xxx INDEX INTERRUPTED = 'E6001', INDEX FILE TOO LARGE = 'E6002', INDEX BINARY FILE = 'E6003', } // Global error handler for MCP tools export function mcpErrorHandler error: unknown : { content: Array<{ type: 'text'; text: string } ; isError: boolean } { if error instanceof TokenZipError { return { content: { type: 'text', text: JSON.stringify { error: error.message, code: error.code, details: error.details, } , } , isError: true, }; } if error instanceof Error { return { content: { type: 'text', text: JSON.stringify { error: error.message, code: 'E9999', stack: process.env.NODE ENV === 'development' ? error.stack : undefined, } , } , isError: true, }; } return { content: { type: 'text', text: JSON.stringify { error: 'Unknown error' } } , isError: true, }; } 2.15 Testing Strategy typescript // tests/unit/extractor/typescript.test.ts import { describe, it, expect, beforeEach } from 'vitest'; import { TypeScriptExtractor } from '../../../src/extractor/code/typescript'; import { createMockContext } from '../../helpers'; describe 'TypeScriptExtractor', = { let extractor: TypeScriptExtractor; beforeEach = { extractor = new TypeScriptExtractor ; } ; describe 'function extraction', = { it 'extracts a simple exported function', = { const code = export function addUser name: string, age: number : User { return { name, age, id: crypto.randomUUID }; } ; const ctx = createMockContext 'src/user.ts', code, 'module-1' ; const result = extractor.extract ctx ; expect result.symbols .toHaveLength 1 ; expect result.symbols 0 .toMatchObject { name: 'addUser', kind: 'function', isExported: true, isAsync: false, startLine: 2, endLine: 4, } ; expect result.symbols 0 .metadata.params .toEqual { name: 'name', type: 'string' }, { name: 'age', type: 'number' }, ; expect result.symbols 0 .returnType .toBe 'User' ; } ; it 'extracts async arrow function assigned to const', = { const code = export const fetchUser = async id: string : Promise<User = { const res = await fetch \ /api/users/\${id}\ ; return res.json ; }; ; const ctx = createMockContext 'src/api.ts', code, 'module-1' ; const result = extractor.extract ctx ; expect result.symbols .toHaveLength 1 ; expect result.symbols 0 .toMatchObject { name: 'fetchUser', kind: 'function', isExported: true, isAsync: true, } ; expect result.symbols 0 .metadata.isArrow .toBe true ; } ; it 'extracts class with methods, inheritance, and implementation', = { const code = export class UserRepository implements IRepository<User { private cache: Map<string, User = new Map ; async findById id: string : Promise<User | null { return this.cache.get id ?? null; } async save user: User : Promise<void { this.cache.set user.id, user ; } } ; const ctx = createMockContext 'src/repo.ts', code, 'module-1' ; const result = extractor.extract ctx ; // 1 class + 1 property + 2 methods expect result.symbols .toHaveLength 4 ; const classSym = result.symbols.find s = s.kind === 'class' ; expect classSym.name .toBe 'UserRepository' ; expect classSym.isExported .toBe true ; expect classSym.metadata.implements .toEqual 'IRepository<User ' ; const methods = result.symbols.filter s = s.kind === 'method' ; expect methods .toHaveLength 2 ; expect methods.map m = m.name .toEqual 'findById', 'save' ; // Check implements edge const implEdge = result.edges.find e = e.type === 'implements' ; expect implEdge .toBeDefined ; } ; it 'extracts interface with generics and members', = { const code = export interface IRepository<T extends { id: string } { findById id: string : Promise<T | null ; save entity: T : Promise<void ; delete id: string : Promise<boolean ; } ; const ctx = createMockContext 'src/types.ts', code, 'module-1' ; const result = extractor.extract ctx ; expect result.symbols .toHaveLength 1 ; expect result.symbols 0 .toMatchObject { name: 'IRepository', kind: 'interface', isExported: true, } ; expect result.symbols 0 .metadata.generics .toEqual 'T extends { id: string }' ; expect result.symbols 0 .metadata.members .toHaveLength 3 ; } ; it 'extracts imports with type-only and default', = { const code = import type { User } from './types'; import React, { useState, useEffect } from 'react'; import { formatDate } from './utils'; ; const ctx = createMockContext 'src/component.tsx', code, 'module-1' ; const result = extractor.extract ctx ; const imports = result.symbols.filter s = s.kind === 'import' ; expect imports .toHaveLength 3 ; expect imports 0 .metadata.isTypeOnly .toBe true ; expect imports 0 .metadata.source .toBe './types' ; expect imports 1 .metadata.isDefault .toBe true ; expect imports 1 .metadata.source .toBe 'react' ; expect imports 1 .metadata.specifiers .toContain 'useState' ; } ; it 'handles parse errors gracefully', = { const code = export function broken // Missing closing paren and body ; const ctx = createMockContext 'src/broken.ts', code, 'module-1' ; const result = extractor.extract ctx ; expect result.parseErrors.length .toBeGreaterThan 0 ; // Should still return partial results if any expect result.symbols .toBeDefined ; } ; } ; } ; typescript // tests/integration/full-parse.test.ts import { describe, it, expect, beforeAll, afterAll } from 'vitest'; import { MemoryStore } from '../../src/storage/memory/store'; import { Indexer } from '../../src/engine/indexer'; import { createQuery } from '../../src/query/builder'; import path from 'path'; describe 'Full Parse Integration', = { let store: MemoryStore; let indexer: Indexer; const fixturePath = path.join dirname, '../fixtures/ts-monorepo' ; beforeAll async = { store = new MemoryStore ; await store.initialize ; await store.migrate ; indexer = new Indexer store, fixturePath ; await indexer.fullIndex ; } ; afterAll async = { await store.close ; } ; it 'indexes all modules in the monorepo', async = { const modules = await createQuery store, fixturePath .modules .toArray ; expect modules.length .toBeGreaterThanOrEqual 3 ; // apps/web, apps/api, packages/shared } ; it 'extracts all TypeScript symbols', async = { const symbols = await createQuery store, fixturePath .symbols .eq 'kind', 'function' .toArray ; expect symbols.length .toBeGreaterThan 10 ; } ; it 'resolves cross-module imports', async = { // Find a symbol in packages/shared that's imported by apps/web const sharedExports = await createQuery store, fixturePath .modules .eq 'path', 'packages/shared' .files .symbols .eq 'is exported', true .toArray ; expect sharedExports.length .toBeGreaterThan 0 ; // Check that at least one has an imports edge from apps/web const importEdges = await store.getEdgesTo sharedExports 0 .id, 'imports' ; // At least the file-level import should exist } ; it 'chainable query: modules → files → symbols → filters', async = { const result = await createQuery store, fixturePath .modules .eq 'language', 'typescript' .files .eq 'ext', '.ts' .symbols .eq 'kind', 'class' .eq 'is exported', true .toArray ; expect result.length .toBeGreaterThan 0 ; for const sym of result { expect sym as any .kind .toBe 'class' ; expect sym as any .is exported .toBe true ; } } ; it 'graph traversal: find all callers of an exported function', async = { const targetFunc = await createQuery store, fixturePath .symbol 'formatDate' .eq 'kind', 'function' .toArray ; if targetFunc.length === 0 return; // Skip if fixture doesn't have this const callers = await createQuery store, fixturePath .symbol 'formatDate' .callers .toArray ; // Should find at least one caller expect callers.length .toBeGreaterThan 0 ; } ; it 'formats query result as markdown', async = { const md = await createQuery store, fixturePath .modules .limit 3 .toMarkdown ; expect md .toContain ' ' ; expect md .toContain 'packages/shared' ; // Based on fixture } ; } ; 2.16 Configuration Schema typescript // src/types/config.ts export interface TokenZipConfig { // Project-level config .tokenzip/config.json version: string; storage: { engine: 'surrealdb' | 'sqlite' | 'auto'; path: string; // relative to project root, default: .tokenzip/db surrealdb?: { binary path?: string; // custom surrealdb binary memory?: boolean; // use memory backend instead of RocksDB }; }; languages: { enabled: string ; // 'typescript', 'javascript', 'python', 'sql', 'markdown' disabled: string ; custom: Record<string, { extensions: string ; grammar path?: string; // path to custom tree-sitter WASM extractor path?: string; // path to custom extractor JS } ; }; exclude: { paths: string ; // glob patterns: ' /node modules/ ', ' /dist/ ', ' /.git/ ' files: string ; // exact filenames: 'package-lock.json', 'yarn.lock' max file size kb: number; // default: 500 }; hooks: { pre commit: 'warn' | 'block' | 'off'; post commit: 'on' | 'off'; validate on commit: boolean; // run reference integrity checks }; mcp: { max tokens: number; // default: 8000 transport: 'stdio' | 'sse'; port: number; // for SSE, default: 3777 include source: boolean; // include source code in responses source max lines: number; // max lines of source per symbol, default: 50 }; indexing: { worker threads: number; // default: os.cpus .length - 1, min 1 batch size: number; // files per batch, default: 100 git history depth: number; // commits to index, default: 100 }; workflows: { enabled: string ; // 'create-module', 'update-module', 'implement-feature', 'upgrade-feature', 'bug-fix' }; } export const DEFAULT CONFIG: TokenZipConfig = { version: '2.0.0', storage: { engine: 'auto', path: '.tokenzip/db', }, languages: { enabled: 'typescript', 'javascript', 'python', 'sql', 'go', 'rust', 'java', 'kotlin', 'markdown' , disabled: , custom: {}, }, exclude: { paths: ' /node modules/ ', '