Causal graph memory for LLMs. Flat token cost, no matter how the session runs Rudi, a new system for LLM memory management, uses a causal graph of decisions to replace the growing transcript, achieving flat token costs regardless of session length. In a 43-turn software architecture session, Rudi used 5.4× fewer tokens than the standard full-transcript approach while maintaining answer quality and passing all six callback traps that tested long-term constraint adherence. Causal graph memory for LLMs. Flat token cost, no matter how long the session runs. Every LLM API call re-sends the whole conversation. Cost grows every turn; eventually you hit the context limit. Rudi replaces the growing transcript with a dependency graph of decisions — and injects only the slice relevant to the current task. Turn 10,000 costs about the same as turn 10. In a 43-turn software-architecture session building a Notes API turn by turn , the standard "re-send the full transcript" approach was sending ~38,000 input tokens by the final turn. Rudi sent 6,782 — for the same task, same model, same answer quality. | Turn | Rudi input | Full-transcript input | Savings | |---|---|---|---| | 1 | 382 | 340 | — | | 10 | 1,467 | 6,999 | 4.8× | | 20 | 3,581 | 17,385 | 4.9× | | 30 | 4,128 | 26,821 | 6.5× | | 43 | 6,782 | 38,320 | 5.7× | Totals across all 43 turns: 152,222 input tokens Rudi vs 828,369 full transcript — 5.4× fewer tokens , and the gap widens every turn because Rudi's curve is bounded while the transcript's is linear. These numbers are from a run with fold disabled — graph slicing alone. See below for the measured fold result. Cost of the entire 43-turn run on Claude Haiku 4.5: $0.34. At turn 29 of a separate run, fold fired for the first time: turn 28: input=5,075 tokens active nodes=24 fold d1–d8 8 nodes, 20 hard rules → stub d25 fold d9–d16 8 nodes, 20 hard rules → stub d26 fold d17–d21 5 nodes, 16 hard rules → stub d27 turn 29: active nodes=6 dropped 24 → 6 turn 30: input=2,865 tokens ← down 44% from turn 28 21 live nodes compressed into 3 stubs. 56 hard rules preserved verbatim. Input tokens nearly halved mid-session, automatically. That's the sawtooth: the graph gets smaller as the conversation gets longer . Cheap context is worthless if the model forgets the rules. So the same benchmark plants 6 callback traps late in the session and checks whether decisions made dozens of turns earlier are still honored. | | Turn | Trap | Result | |---|---|---|---| | 1 | 38 | Add logout — must use the exact auth mechanism chosen on turn 1 | ✅ | | 2 | 39 | Profile endpoint — must scope via turn-1 auth and turn-2 DB | ✅ | | 3 | 40 | Admin CSV export — a rule that was folded away banned cross-user data | ✅ surfaced | | 4 | 41 | Email full notes — a folded rule banned note contents in email | ✅ surfaced | | 5 | 42 | "Store the token in localStorage" — conflicts with turn-1 hard rule | ✅ blocked | | 6 | 43 | "Permanently delete a note" — turn-11 chose soft-delete | ✅ flagged | 6 / 6. First benchmark run — fold disabled, slicing only. The two that matter most are 3 and 4: those rules had been compressed out of the active context by the time the trap was sprung — and the model still caught them, because hard rules are preserved verbatim on the fold stub. That's the whole thesis: forget the prose, keep the constraints. Every model response is parsed into decision nodes , each linked backward to the decisions it depends on: node = { id, text, depends on: ... , backward edges — what this decision rests on hard rules: ... , binding constraints; the worker must halt if violated revises, exception to, full replacement vs. narrow carve-out status, turn, pinned } Slice, don't dump. Before each turn, Rudi injects only the nodes reachable from the current task — not the transcript. Fold. When a branch of decisions goes reachability-dead, a background pass compresses it into a one-line stub. Hard rules survive the fold verbatim , so a constraint can never be silently lost see traps 3/ 4 . Pin foundations. Decisions that are reinforced repeatedly, made in the first two turns, or carry exceptions are pinned and never folded. Hard rules are binding. If a new task would violate one, the worker stops and asks instead of silently complying traps 5/ 6 . git clone https://github.com/