I Made TS Compiler Graph MCP: 10x Fewer Tokens in Claude Code A developer built @ttsc/graph, an MCP tool that provides coding agents with a TypeScript compiler-built code graph instead of raw source files. In benchmarks, it reduces token usage by roughly 10x on structural questions by returning only names, edges, signatures, and spans. The tool forces chain-of-thought reasoning and keeps answers verifiable with exact file:line references. TL;DR codegraph , codebase-memory-mcp , and serena all got there first, handing a coding agent code intelligence over MCP so it stops grepping. On my own open-ended questions the token bill didn't budge: the agent kept sliding back to grep, and no amount of forceful prompting could stop it. So I built . @ttsc/graph It gives the agent an index the TypeScript compiler already resolved, never the source bodies, through a single tool with a forced chain-of-thought. On "how does this work?" questions that works out to roughly 10× fewer tokens, and the answers are no worse. That figure is a median, and a conservative one. - Repository: https://github.com/samchon/ttsc - Benchmark: https://ttsc.dev/docs/benchmark/graph On the left, the agent is lost in a maze of files, chasing dashed arrows dozens deep. On the right, it's reading a single compiler-built graph of nodes and edges, with the file:line anchors it can open and check. You're new to a TypeScript repo, so you ask the agent for a tour: what's the main runtime flow, from the public API down to the code that does the work, and what should you read first? You know how it goes. It opens a file, follows an import into another, then another, and a few dozen files later it gives you an answer. @ttsc/graph cuts that crawl short. Over MCP, it hands your agent a graph of your TypeScript codebase that the compiler itself drew: what calls what, what depends on what, and where each piece lives. The agent answers structural questions straight from the graph instead of spelunking through files, and every claim it makes points at an exact file:line the compiler resolved. Nothing invented, just a location you can open and check for yourself. It's the same question and the same agent in every case, and only @ttsc/graph stays flat across the repos no matter how big they get. The other three, codegraph, codebase-memory, and serena, swing all over the place, and a few even spend more than the baseline does. If you want to dig into the details, every model, the per-repo prompts alongside this shared one, and the full method are on the interactive benchmark page https://ttsc.dev/docs/benchmark/graph . A city map keeps every street and building. A subway map throws most of that away and keeps the connections you need. So what is a "code graph"? Land in a city you've never visited and you don't read every street sign in order; you glance at a subway map. It throws away the buildings and the streets and the distances, and keeps the one thing you need: what connects to what. You can read the whole thing in five seconds. Code has the same shape. The nodes are functions, classes, and files; the edges are the calls, imports, and inheritance between them. Draw all of them and you have a code graph, an index of what calls what. The agent can query that index once instead of walking every street itself. This is the first real fork in the road. @ttsc/graph doesn't return source bodies at all. What it returns is names, edges, signatures, and spans, and nothing more than that. The span is the part that matters. Every range it cites is a coordinate the compiler vouched for, so if you doubt an answer, you open that exact spot and check it. What reassures me isn't that it read no files; it's that a place to verify always comes attached. Tokens stay flat because the size of the response doesn't depend on the size of the repo. Whether the project is 100k lines or 10k, one question comes back with a similarly sized chunk of index. Why the other tools spill source instead is the autopsy in §2. The short version: it reads an index out of the program the TypeScript compiler has already resolved, and it feeds a single MCP tool a forced chain-of-thought so the agent doesn't wander off. That's all there is to it. The rest of the post earns those claims. §2 is the autopsy of why the tools before it don't move the token bill; §3 is how this one does; §4 is how to run it yourself. First, why I built it. Honestly, none of this was my idea. codegraph https://github.com/colbymchenry/codegraph put a code graph in front of an agent over MCP first, and codebase-memory-mcp serena The benchmark in this post runs all three against @ttsc/graph on two prompt families: a shared onboarding question, and codegraph 's own per-repo questions. codebase-memory-mcp doesn't ship any reproducible questions to port, and while serena does publish a reproducible evaluation, it's one you run on your own code rather than a fixed question set, so neither one gives us a comparable third family. And the core claim behind all of them is legitimate. The enemy is the agent's grep-find-Read crawl loop. Every "where is this?" becomes a grep, then a file open, then an import chased into another grep, dozens of round-trips that burn tokens and time. codegraph 's pitch is clean: replace that loop with a single graph query, for "58% fewer tool calls" and "file reads to ~zero." codebase-memory-mcp goes further and headlines "120× fewer tokens," a 99.2% cut. serena asks the agent to route both its reading and its editing through symbol-aware tools it calls "much more token-efficient than your own." It all sounded great, so I installed all three. One thing up front. Cutting an agent's tool-call count is the easy part; cutting the tokens it spends is harder. Cutting the tokens without making the answer worse is the real problem, and it's easy to mistake the first of those for the last. What actually happened was a lot less exciting than the pitch. zod on GPT-5.5. All three codegraph , codebase-memory , and serena made the agent spendmorethan it would have with no MCP at all, +22% to +27%. @ttsc/graph came in at 6% of the baseline. This is the starkest repo I found, but the pattern holds across the matrix. codegraph spent up to 47% more, codebase-memory-mcp up to 66% more, and serena up to 93% more. codebase-memory-mcp 's sharpest tools want a Cypher query or an exact qualified name , codegraph wants you to name the symbols in a flow, and serena wants you to activate the project and then hand it an exact name path you don't have yet. The hand-shaping alone cost more time than the tools gave back.So for my kind of question, all three left me worse off than before I installed them. Two things are true at once: they're the strongest tools of their kind, built by people who saw the problem before I did; and on an open question, every one made the agent spend more than nothing would have. That's when I went digging. The first cause is what each tool does with what it already has. codegraph returns whole source bodies. In its own words, the output is "byte-for-byte identical to what the Read tool returns," and it tells the agent to "treat each block as a Read you have already performed." It's the Read, done for you, which is fine when you're editing. But for a broad "how does this work?" question, the body is the token bomb. Once you're past a handful of files it hits its cap, and the overflow collapses into an "additional relevant files not shown " list that tells you not to Read them yourself, so you call codegraph explore again and get more bodies back. codebase-memory-mcp is the more interesting case, because underneath it has the right idea. It builds a real relation graph, a lot like the one I ended up with, that tracks not just where things are but how they call and depend on each other. The trouble is the surface in front of it. That capability is spread across fourteen MCP tools, and the ones that actually reach the relation data want precise input, either a Cypher query or an exact qualified name . There's a plain-language search too, but it doesn't touch the graph the way the query tools do. Faced with fourteen tools and a query language, the agent mostly never reached the relation data at all. In the runs I measured it called the MCP zero times and fell back to the shell every time. It gave up and grepped. The graph was right there; the surface buried it. serena manages to do both at once. Because it's backed by a language server its symbols come out properly resolved, but it still serves source bodies on demand, and it puts them behind around fifty tools that the agent only reaches after activating the project and waiting out a language-server cold-start. Given all that, it grepped too, with a median of zero to one MCP calls before falling back to the shell. So one hands back too much, another keeps the right thing behind a door the agent can't find, and the third manages to do both. None of them moves the token bill. The second cause is the instructions, and here the three tools scatter in every direction. codegraph forces its tool, and it forces it hard. The MCP instructions tell the agent to "use it instead of reading files," to "call it before you Read," to "don't grep or Read first," and to reach for it on "almost any question." So it fires even when the graph isn't the answer, on a config file, a small edit, or a question it can't answer at all, and those wasted calls get in the way of the real work. The README is candid about why: the tool only helps when you query it directly and is pure overhead otherwise, so the instructions lean hard on the agent to keep it from becoming overhead. codebase-memory-mcp does the opposite. Its MCP initialize sends no instructions at all. What guidance it does have lives in an install-time skill file that the agent never sees if you just wire up the server, and its auto-indexing is off by default. So you get fourteen tools over the wire with almost nothing telling the agent which one to reach for or when, and, as we just saw, it mostly didn't and fell back to the shell. serena over-directs too, and more literally than the others. In the Claude Code setup its instructions forbid the agent from using its own Read and Edit on code files, swap in a wholesale replacement system prompt, and warn that the built-in editor "will deny such edits." It gives the agent its own read, grep, and edit tools to use in their place, swapping out the agent's hands for its own. And even after all that, the agent still reached for its own grep. @ttsc/graph puts a hammer in your hand, and you're still holding the pen and the wrench and everything else. serena takes the hand off and bolts a hammer where it used to be: great for driving nails, but now that's all the arm can do. So one under-directs, the other two over-direct, and none of them lands on what you actually want, which is for the agent to use the graph when it helps and stop when it doesn't. Forcing a tool isn't the same as getting it adopted, and a big surface area isn't the same as capability. A tool the agent won't reach for is worse than no tool at all, because you're still paying for its description on every single turn. You can read the effort in the prompts themselves. serena 's Claude Code configuration doesn't only tell the agent to prefer its tools; it enumerates specific rationalizations and rules each one out, down to "I already know the path" and "one Read call is faster than three Serena calls." Every line in that list is a rationalization the model actually produced, caught and forbidden one at a time. It's real craft, and there's a lot of it: serena swaps in about a hundred and fifty lines of replacement system prompt before the first question, and codegraph ships around a hundred lines of instructions plus a skill file injected at install time. Almost all of it exists to secure one thing, that the agent actually calls the tool. On my questions, none of it worked. You can't write your way to adoption. To their credit, none of the three hide their limits. codegraph 's README says its token savings depend on scale and are small on a normal codebase. codebase-memory-mcp reports its biggest numbers on a handful of structural queries rather than open-ended ones. serena 's own docs admit its symbolic tools win on large edits but lose to a plain text edit on small ones. All three tell you plainly that the wins were measured on targeted scenarios. And that's the point. Those limits only get bigger the more general your usage is, and general usage is where I spend my days. But none of it means the approach was wrong. The idea is a lovely one; materializing it took a lot of trial and error, and the three of them did that work in the open, wall by wall. I came to the same problem holding two things they didn't: a compiler toolchain that hands me a resolved graph for free, and a way to make the agent comply with a typed contract. So I took what they proved was worth wanting and built it for the case they weren't aimed at: the open-ended question, with the tokens down and the answer no worse. @ttsc/graph is built to fix those problems one at a time. There are four of them, and the table below lines each one up with its fix. | Pain §2 | Antidote §3 | |---|---| | 2.4 — over-forced up to a hard ban , or no guidance at all | 3.1 — guides without forcing escape + stop | | 2.3 — a real graph buried under 14 or ~50 tools | 3.2 — one tool, asked in plain language | | 2.3 — source bodies blow up tokens | 3.3 — returns index only | | 2.3 — a setup tax before the tools even work | 3.4 — a free byproduct, no setup step | | only works if trustworthy | 3.4 — the real compiler | It never forces anything, to start with. Not one line of the instructions says "use this instead of Read." Instead they state a condition: use it when the answer depends on TypeScript symbols, calls, or types. And escape is a first-class option for when the evidence lives outside the graph entirely. There's only one strong rule: once you've used it, treat the result as compiler-issued truth and don't go re-checking it by reading files. Once enough evidence is in, the result itself tells the agent to answer now and stop. It points at a place to stop rather than demanding more calls. So the forced calls that used to get in the way never happen. The agent reaches for the graph when it should and skips it when it shouldn't. Then there's tool choice. Claude Code won't always pick the right tool at the right moment. The more options you give it, the more misfires you get, and that's exactly what buried codebase-memory-mcp 's relation graph behind its fourteen MCP tools, and serena 's behind its roughly fifty, with the agent never finding its way to the right one. codegraph sidesteps the tool-count trap with a single default tool. It just pushes that one tool too hard, which is the §3.1 problem instead. @ttsc/graph has exactly one tool. What to do inside it is split out by a union type that the chain-of-thought fills in, and the safeguards stack up: a review step in the middle of the CoT can switch the request type, and if it's still wrong after that, escape bails out. It errs on the safe side. This is one of the two differences that actually decide it; the other is the compiler, in §3.4. Every tool above tries to change what the agent does by talking to it, with more instructions and stronger words. @ttsc/graph changes what the agent does by changing the shape of the tool. A field the schema marks as required can't be skipped the way a line of prompt can be ignored, and a union of request types routes the agent by construction instead of by persuasion. Here's the entire surface. It's one type, with the comments trimmed down to one line each: // TOOL DESCRIPTION: inspect the compiler-built TypeScript code graph over MCP. export interface ITtscGraphApplication { inspect typescript graph props: ITtscGraphApplication.IProps, : ITtscGraphApplication.IResult; } export namespace ITtscGraphApplication { // The forced chain-of-thought, then exactly one graph request. export interface IProps { question: string; // restate the code question being asked draft: string; // intended request type + why it is the smallest review: string; // self-correct a wrong/broad draft; pick escape if off-graph request: // the final operation, chosen after review | ITtscGraphEntrypoints.IRequest // orientation: where to start reading | ITtscGraphLookup.IRequest // find a symbol by name | ITtscGraphTrace.IRequest // trace call / data flow | ITtscGraphDetails.IRequest // a symbol's signature, members, neighbors | ITtscGraphOverview.IRequest // repo-level overview | ITtscGraphTour.IRequest // broad code tour, answered in one call | ITtscGraphEscape.IRequest; // not a graph question - bail out } // The selected request's result; result.type mirrors request.type . export interface IResult { result: | ITtscGraphEntrypoints | ITtscGraphLookup | ITtscGraphTrace | ITtscGraphDetails | ITtscGraphOverview | ITtscGraphTour | ITtscGraphEscape; } } As you can see, question , draft , and review are forced function arguments. typia compiles this type into the tool's JSON schema and validator, so a chain-of-thought that hasn't been filled in gets rejected right at the call boundary. As typia puts it, "free prose can hide a skipped step; a typed submission cannot." Why that matters more than any instruction is the thread I pick back up in §5. Those one-line comments are condensed. In the source, the fuller JSDoc on each member is what typia compiles into the MCP instruction and the schema descriptions the model actually reads. If you want the details, they're in ITtscGraphApplication.ts https://github.com/samchon/ttsc/blob/master/packages/graph/src/structures/ITtscGraphApplication.ts . This is also the answer to the hand-shaping problem. You just ask in plain English, as vague as "just figure it out," and translating that into the precise request type and correcting itself along the way is the CoT's job. There are no symbol names to supply, no Cypher, and no query syntax to memorize. Each request branch is a single graph operation with its own request .IRequest and result type. Here's what each one does: ITtscGraphTour ITtscGraphEntrypoints ITtscGraphLookup ITtscGraphTrace ITtscGraphDetails ITtscGraphOverview ITtscGraphEscape It returns the index and nothing else. Names, edges, signatures, and spans: the relationships themselves, not just where each thing sits. The edges and signatures carry what calls what and how, so the agent can assemble the answer right there without opening a file. Source bodies are never inlined, and a span is a citation you can verify, not an instruction to go read. That has two effects. There's no token explosion from spilled bodies, and there's no compounding mess where that explosion combines with a misfired MCP call to cloud the agent's judgment. The response stays bounded no matter the repo size, so the tokens stay flat. That one decision is what defuses both the source-body blowup and the accuracy drop from the autopsy. And it's where the promise I opened with finally pays off: killing the grep crawl loop is table stakes, plenty of tools do it. Doing it without the tokens blowing up, and without the answer getting any worse, is the part that was missing. All three of those only hold up if the results are trustworthy, and that trust can only come from a real compiler. A heuristic parser like tree-sitter only sees text, so there are things it can't resolve: tsconfig paths aliases like @app/ pointing at the real file, cross-package references inside a pnpm monorepo, symlinks, and re-export chains. Only a compiler that has actually finished module resolution can wire all of those up correctly. @ttsc/graph reads the program that ttsc has already type-checked and resolved, so aliases and monorepos line up on their own. And because it rides along with the toolchain, the graph comes out as a nearly free byproduct of the type-check that's already running. There's no separate index step the way there is with codegraph init , codebase-memory-mcp 's index repository , or serena 's project activation and language-server cold-start. No file watcher, no stale index. The compiler throws in a bonus, too. The graph doesn't only hold structure; it also carries every tsc compile error and every @ttsc/lint and plugin typia, nestia lint violation, each one fused onto the symbol that owns it. So "what's broken here?" and "what breaks if I change this?" come out of the same index. You get the shape of the code and what's currently wrong with it, in one graph. It's exact, so you can trust it, and because you can trust it, you can stop. That chain, from exact to trusted to done, is what the whole thing rests on. Only bragging would be a scam, so let me start with the limits. @ttsc/graph is TypeScript-only. It trades breadth for depth, which is the opposite bet from the 158-language tools. That isn't a footgun. On a non-TypeScript project there's no graph to hand over, so the agent never leans on it. Even inside a TS project, the things outside the typed graph, like configs, docs, and exact-text search, escape out cleanly. Step outside its scope and it doesn't break; it steps aside. It's strong on exactly the questions a graph can answer, and it doesn't pretend to go further. There's one prerequisite that matters more than the rest. @ttsc/graph rides on ttsc , and ttsc runs on the TypeScript-Go TypeScript v7 runtime. v7 isn't released yet; it's still at RC, which is why the install pins typescript@rc . It runs fine on the RC, but to be clear, this isn't something you can drop onto your current stable TypeScript v6.x . Once v7 ships, that caveat goes away on its own. I built this because I couldn't stand using codegraph , so I put real effort into making it hold up in general use, none of that "I installed it and Codex got dumber." On my own machine, at least, it's been solid. The world is wide, and there are failure modes I haven't hit yet. It's open source, so if something feels awkward or you catch something I got wrong, file an issue https://github.com/samchon/ttsc/issues . That's the fastest way to make it solid. Setup is four lines. npm install -D ttsc @ttsc/graph typescript@rc { "mcpServers": { "ttsc-graph": { "command": "npx", "args": "-y", "@ttsc/graph" } } } The agent queries the graph on its own, so you never call it by hand. The title says Claude Code, but any MCP-capable agent works just as well, including Codex, Cursor, and the rest. One more thing: run it in your own project and you can spin your code graph around in 3D in the browser. npx @ttsc/graph view This is TypeORM, colored by kind: None of this is unique to TypeScript. A code graph you can trust takes each language's own compiler, because heuristics can't resolve aliases, monorepos, or types. I picked TypeScript; someone working in Go, Rust, Python, Java, or C should ride their language's compiler or LSP and do the same for theirs. One language handled at compiler depth is worth more to an agent than 158 skimmed off the top, and I'd love to see it. One idea from §3.2 is worth stating plainly on its own. I'll mark it as opinion: I hold it strongly, but I won't pretend it's the final word. Here is the case for it. You can't reliably steer an agent's behavior with prose: an instruction can be read and then ignored, but a required field on a typed schema cannot be left blank. So instead of asking the agent to reason before it acts, you make the reasoning a function argument it has to fill in, and you route its choices through a union of types instead of a paragraph of pleading. The reasoning stops being a request and becomes part of the contract. That's what I mean by CoT compliance, and it's why I think it beats writing ever-longer instructions: a schema is the one part of the exchange the model physically cannot skip. The nice part is the one from §2.4: you never have to take the agent's arm off and bolt a hammer where its hand was. You hand it the hammer, and the shape of the tool tells it how to swing. I've written this up properly elsewhere, with the theory, the numbers, and where it breaks down. If you want that depth, these two posts go into it: I want to put this gently, as a suggestion rather than a critique. From the outside, the trouble a tool like codegraph runs into doesn't look like a tree-sitter problem, or a sign it was built wrong. It looks like the tool just wasn't getting called the way its authors meant it to be. Once that happens, the natural response is to make the prompt louder. More instructions, stronger wording, until you're forbidding the agent's own Read and Edit and bolting on a prosthetic where its hand used to be. I spent a long time writing prompts myself before I accepted they weren't the lever. That's exactly the problem I think this approach solves. The adoption problem, the agent not reaching for your tool at the right moment, is what a typed contract takes on directly, and it doesn't care whether the data underneath came from tree-sitter or a compiler. So if you've felt that pain, my honest belief is that union types plus CoT compliance would lift a lot of it. The same tools, made whole. CoT compliance is the arrow: it turns the forced prosthesis back into a hand that grips the tool on its own. The interface is small and MIT-licensed. If the codegraph , codebase-memory-mcp , or serena authors ever want to try it, I'd be glad to help wire it up. And if there's no appetite for it, I'd rather show it than argue about it. My hunch is the same interface would carry a tree-sitter tool a long way too, though not as far as a compiler-backed one, since the agent can't fully trust text-level results and will re-read a file or two to be sure. If I build it myself I'd expect a token reduction of about 3x, next to the 10x here. I haven't run that experiment yet, but if no one beats me to it, I'll build that version and report back, warts and all. Either way, the thing I keep coming back to is the same: don't make your agent read the whole codebase. Hand it the index the compiler already drew. The pioneers this builds on: codegraph : codebase-memory-mcp : serena :