cd /news/large-language-models/causal-graph-memory-for-llms-flat-to… Β· home β€Ί topics β€Ί large-language-models β€Ί article
[ARTICLE Β· art-33938] src=github.com β†— pub= topic=large-language-models verified=true sentiment=↑ positive

Causal graph memory for LLMs. Flat token cost, no matter how the session runs

Rudi, a new system for LLM memory management, uses a causal graph of decisions to replace the growing transcript, achieving flat token costs regardless of session length. In a 43-turn software architecture session, Rudi used 5.4Γ— fewer tokens than the standard full-transcript approach while maintaining answer quality and passing all six callback traps that tested long-term constraint adherence.

read5 min views1 publishedJun 19, 2026
Causal graph memory for LLMs. Flat token cost, no matter how the session runs
Image: source

Causal graph memory for LLMs. Flat token cost, no matter how long the session runs.

Every LLM API call re-sends the whole conversation. Cost grows every turn; eventually you hit the context limit. Rudi replaces the growing transcript with a dependency graph of decisions β€” and injects only the slice relevant to the current task. Turn 10,000 costs about the same as turn 10.

In a 43-turn software-architecture session (building a Notes API turn by turn), the standard "re-send the full transcript" approach was sending ~38,000 input tokens by the final turn. Rudi sent 6,782 β€” for the same task, same model, same answer quality.

Turn Rudi input Full-transcript input Savings
1 382 340 β€”
10 1,467 6,999 4.8Γ—
20 3,581 17,385 4.9Γ—
30 4,128 26,821 6.5Γ—
43 6,782 38,320 5.7Γ—

Totals across all 43 turns: 152,222 input tokens (Rudi) vs 828,369 (full transcript) β€” 5.4Γ— fewer tokens, and the gap widens every turn because Rudi's curve is bounded while the transcript's is linear.

These numbers are from a run with fold disabled β€” graph slicing alone. See below for the measured fold result.

Cost of the entire 43-turn run on Claude Haiku 4.5: $0.34.

At turn 29 of a separate run, fold fired for the first time:

turn 28: input=5,075 tokens   active nodes=24
[fold] d1–d8   (8 nodes, 20 hard rules) β†’ stub d25
[fold] d9–d16  (8 nodes, 20 hard rules) β†’ stub d26
[fold] d17–d21 (5 nodes, 16 hard rules) β†’ stub d27
turn 29: active nodes=6   (dropped 24 β†’ 6)
turn 30: input=2,865 tokens   ← down 44% from turn 28

21 live nodes compressed into 3 stubs. 56 hard rules preserved verbatim. Input tokens nearly halved mid-session, automatically. That's the sawtooth: the graph gets smaller as the conversation gets longer.

Cheap context is worthless if the model forgets the rules. So the same benchmark plants 6 callback traps late in the session and checks whether decisions made dozens of turns earlier are still honored.

# Turn Trap Result
1 38 Add logout β€” must use the exact auth mechanism chosen on turn 1
βœ…
2 39 Profile endpoint β€” must scope via turn-1 auth and turn-2 DB
βœ…
3 40 Admin CSV export β€” a rule that was folded away banned cross-user data
βœ… surfaced
4 41 Email full notes β€” a folded rule banned note contents in email
βœ… surfaced
5 42 "Store the token in localStorage" β€” conflicts with turn-1 hard rule βœ… blocked
6 43 "Permanently delete a note" β€” turn-11 chose soft-delete βœ… flagged

6 / 6. (First benchmark run β€” fold disabled, slicing only.) The two that matter most are #3 and #4: those rules had been compressed out of the active context by the time the trap was sprung β€” and the model still caught them, because hard rules are preserved verbatim on the fold stub. That's the whole thesis: forget the prose, keep the constraints.

Every model response is parsed into decision nodes, each linked backward to the decisions it depends on:

node = {
  id, text,
  depends_on: [...],     # backward edges β€” what this decision rests on
  hard_rules: [...],     # binding constraints; the worker must halt if violated
  revises, exception_to, # full replacement vs. narrow carve-out
  status, turn, pinned
}

Slice, don't dump. Before each turn, Rudi injects only the nodes reachable from the current task β€” not the transcript.Fold. When a branch of decisions goes reachability-dead, a background pass compresses it into a one-line stub.Hard rules survive the fold verbatim, so a constraint can never be silently lost (see traps #3/#4).** Pin foundations.Decisions that are reinforced repeatedly, made in the first two turns, or carry exceptions are pinned and never folded. Hard rules are binding.**If a new task would violate one, the worker stops and asks instead of silently complying (traps #5/#6).

git clone https://github.com/<you>/rudi
cd rudi
pip install anthropic flask flask-cors

export ANTHROPIC_API_KEY="sk-ant-..."

python benchmark_long_haiku.py

You'll watch the input-token curve stay flat while a naive transcript would balloon, and see all 6 callback traps resolve.

Two calls per turn. You keep your own model key; Rudi only manages the graph.

import rudi

s = rudi.get_slice(task)

rudi.store_decisions(decisions, inject_ids=s["inject_ids"])

Or let Rudi drive the whole turn (LLM call + store + fold) in one shot:

result = rudi.run_turn(task)   # β†’ {"display", "tokens_in", "tokens_out", ...}

Storage is local SQLite (store.py

) β€” one row per decision node. No server, no cloud, no setup.

| Graph slicing bounds the token curve | βœ… measured β€” table above | | Decisions recalled 40+ turns later | βœ… 6/6 callbacks | | Hard rules survive fold verbatim | βœ… traps #3/#4 | | Conflicts blocked, not silently obeyed | βœ… traps #5/#6 | | Fold GC compresses dead branches mid-session | βœ… measured β€” 24 nodes β†’ 6, input βˆ’44% at turn 30 | | Retrieval fallback above ~80 active nodes | ⏳ built, not yet benchmarked at scale |

No vapor. The table is what the logs say; the in-progress rows are labeled as such.

Business Source License 1.1. Free for personal use, research, development, and self-hosting. Commercial SaaS or managed hosting use requires a paid license from the maintainer.

Want a commercial license? Open an issue or email ** raphaelwkago@gmail.com**.

Want to use Rudi commercially without AGPL obligations? Open an issue or email ** raphaelwkago@gmail.com**.

── more in #large-language-models 4 stories Β· sorted by recency
── more on @rudi 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/causal-graph-memory-…] indexed:0 read:5min 2026-06-19 Β· β€”