# Rigor Compresses: Why AI Agents Need Graphs, Not More Context

> Source: <https://dev.to/gyu07/rigor-compresses-why-ai-agents-need-graphs-not-more-context-5404>
> Published: 2026-06-18 01:53:21+00:00

A coding agent can edit one function beautifully.

But ask it a global question — "what breaks if I change this?", "can this write run without the tenant check?", "is this migration destructive?" — and it often starts guessing.

For a while, I thought the answer was better token efficiency. Now I think the deeper question is: what facts do you hand the agent in the first place?

I recently [wrote on Medium](https://medium.com/@takafumi.endo/token-allocation-the-new-capital-discipline-of-ai-native-companies-15a9183fe083) that token spend only becomes an *asset* when you redesign the work around it — verification, context, workflow — rather than just buying more of it.

This post is the engineering side of that argument. Since then I've been hunting for token-efficient tooling, and writing my own when nothing fits. What follows is the piece I find most leverageable: making each token buy a verified fact instead of a guess. (Just where my own tinkering has led — not advice.)

"Saving tokens" sounds like a story about compression and caching — or, lately, about giving the agent *more* context: bigger windows, more retrieval. But in my own work, what moved the needle most wasn't more context; it was the quality of the few facts I hand the agent.

Coding agents are strong locally. Within one function, one diff, they write beautifully. But the global facts — what calls this, what breaks if I change it, whether this write can run without the tenant check, whether this migration is destructive — they tend to guess at, fairly confidently, and sometimes get wrong.

So what helps, I think, isn't a long analysis log or a full-repo grep — it's handing over those global facts short, accurate, and with evidence. The heart of token efficiency is not *saving* but *maximum signal per token*.

What's interesting is that most of the global facts agents struggle with look like graph questions.

`depends-on`

, `calls`

, `flows-to`

, `must-precede`

are all relations, and a set of relations is a graph. Facts an agent has trouble deriving on its own often come out cleanly once you turn them into a graph.

Put differently: every one of these facts can be written in a single language — a dependency graph. Impact, ordering, guard dominance, flow. It's the **common language for handing facts to an agent**.

And the toolkit you need is smaller than you'd think: traversal, topological sort + cycle detection, strongly connected components (SCC), reachability, dominators, occasionally shortest path or max-flow. Past that, an eye for spotting when a problem *is* a graph covers most of it, I'd say.

A fact I want to hand an agent looks less like a paragraph and more like this:

```
DB.GUARD_BYPASS  repo.rs:142  severity=high  quality=verified
  claim:   delete reachable without tenant guard
  missing: assert_tenant() does not dominate this sink
  path:    handler → repo.raw_delete
  next:    use repo.delete() (scoped) | read(repo.rs:120)
```

The hard part isn't formatting the fact. It's producing it cheaply, with enough evidence to trust it.

Graph theory, DAGs first among them, has long been put to practical use in software. Build systems, dependency resolution, scheduling, compilers, `plan`

/`apply`

-style orchestration — open any of them up and it's a graph. None of this is new.

But it does feel like it's accelerating now, and I think the reason is AI coding. Implementing these algorithms used to sit behind a wall of cost and specialization. That wall has come down, and algorithms feel like they're being democratized in a practical sense.

The framing I personally find useful is that an algorithm is two layers: a base theory, and on top of it, optimization for a specific domain. The theory is shared; the optimization is domain-specific. What AI dropped most is the latter — the cost of implementing it *for your own domain*.

This shift quietly changed how I pick tools. Generic analysis OSS gives you the commodity — algorithms, parsers, a query language — and does it well. But the valuable layer, your own domain, is something you end up filling in yourself. A generic taint engine doesn't carry the assumption that `assert_tenant()`

is a required guard, or which calls are dangerous sinks, or the invariants of your declarative schema layer. The algorithms are interchangeable; the domain model on top is harder to swap.

Parsing is a commodity (tree-sitter), the algorithms are a small known core, and an agent can help you write the domain rules quickly. So the bar for "worth building" drops, and a long tail of small, domain-specific detectors — never worth it before — starts to pencil out. That long tail, I suspect, is where a good chunk of the practical value lives. And the agent both builds them and consumes them: a generate ⇄ verify flywheel starts turning.

The substrate these homegrown detectors build their facts on is two layers: tree-sitter and LSP. tree-sitter is an incremental, error-tolerant parser — it can pull structure cheaply even out of a half-written, broken buffer — but it doesn't resolve names, so I treat facts from it as `estimated`

. LSP actually resolves types and references, so those can be `verified`

, at the cost of being stateful, with cold starts and lag. What I find interesting is that the layer you pick maps straight onto the quality label — and that freshness tends to invert: synchronous, local tree-sitter is closest to the dirty buffer, while precise LSP can lag behind as its index catches up. So I've been doing an "until LSP catches up, fall back to tree-sitter and label it estimated" kind of switch. Right now I'm implementing and testing exactly this — driving the LSP client headless (keeping the buffer in sync via `didChange`

, juggling servers across languages) — and it's the kind of thing that gets more interesting the more you actually build it.

The division of labor, roughly:

| tree-sitter | LSP (language server) | |
|---|---|---|
| Gives you | syntax / structure (CST) | resolved types & references |
| Resolves names? | no | yes |
| Quality label | `estimated` |
`verified` |
| Cost / state | fast, local, synchronous | stateful, cold starts, lag |
| Half-written buffer | works (error-tolerant) | often bails |
| Freshness vs the edit | closest to the dirty buffer | can lag while re-indexing |

There's one discipline I try to keep: where to draw the trust boundary. Dominator computations and dataflow fixpoints, written with AI help, tend to come out "plausible but subtly unsound" — and in hazard detection, unsoundness means a miss that reads as safety, which is the failure I most want to avoid. So, for myself, here's what I do:

`verified`

(graph-derived) or `estimated`

(pattern-derived)Where you put the "verification of the verifier" feels like what decides whether the flywheel turns toward value or toward error.

Take the `DB.GUARD_BYPASS`

fact from earlier — here's how it gets computed. In a declarative schema-management tool, a hazard like "a dangerous write is reachable without passing the guard" turns into a fairly clean graph question: the required guard should dominate the dangerous sink, and if it doesn't, the sink is reachable without the guard. I compute this over read-models I already have (a symbol index, dependency edges), cheaply, and it's close enough to sound that I can emit it as `verified`

.

By `verified`

I don't mean globally true in the formal-methods sense — I mean something narrower: the claim follows from the graph model I built, with evidence, rather than from the agent's guess. The model itself can still be incomplete, so the label is about the *source* of a fact, not absolute truth.

What's doing the work here is that *rigor produces compression*. A pattern match can only hedge — "possibly an unscoped delete, please verify" — and hedging costs tokens. A dominator result can state one fact flatly: this guard does not dominate this sink. The more rigorous the analysis, the shorter the fact. "Doing it properly" and "handing it over in few tokens" might, it turns out, be the same investment.

The same dependency question keeps reappearing at larger scales — first inside a function, then across a repository, then across storage systems: what depends on what, what must happen before what, and what changes when this node moves?

The two-layer structure shows up most sharply, I find, around databases. Even just "storing data," the design changes with PostgreSQL vs SQLite. It changes again depending on whether you ship schema via migrations or express it declaratively. And OLTP vs OLAP are, infrastructurally, almost different worlds.

But — and this is the part I find interesting — look at them through the lens of "how do you define the dependencies," and things that are scattered at the infrastructure level start to connect into a single graph. Transactions and analytics, writes and aggregations, mutable state and immutable snapshots. Physically separate worlds, yet contiguous on the abstraction of how dependencies are defined. Even an architecture where an event log links a control plane (Postgres) to a data plane (object storage + columnar) ends up looking, to me, like one dependency graph.

Could you bind even heterogeneous infrastructure together in the *same* dependency-graph language you use to hand an agent facts? I feel like there's quite a bit of potential in that view.

I came in through the practical doorway of token efficiency and ended up at "deterministic facts to hand an agent," the graphs underneath them, the democratization of algorithms that makes them buildable, and the abstraction of dependency. The theory is mature, the implementation barrier has dropped, and per-domain craftsmanship is becoming the center of value, it feels like — and the area still feels wide open.

I plan to dig into it for a while. As a start, I'll jot down the themes I'm interested in, a bit at a time.

One last thing — a hope. These days, working with Claude Code or Codex, I keep tacking defensive lines onto my prompts: "watch for false positives," "don't miss false negatives," "don't guess, verify." It isn't that I distrust the agent. It's that the facts I'm handing it are the vague part, so reminders are the only way I can paper over the worry.

My hope is that, as I keep practicing this graph stuff — as I get better at handing over grounded facts that carry their own `verified`

/ `estimated`

labels — I'll need to write those "please be careful" lines a little less often.

If the fact itself is sound, you don't have to ask the agent to be careful about it.

This is the practical reason I care about the shape of facts, not just the size of context. Instead of telling the agent to be cautious, you bake the caution into the facts.

That, maybe, is the view I most want to reach at the end of this detour.