Token-warden is a thrifty office manager for your AI assistants Token-warden, a Claude Code plugin, treats AI agent memory as an engineering problem by requiring every rule to prove it saves more tokens than it costs on a fixed benchmark or face eviction. The plugin uses a four-stage feed-forward loop to collect session data, distill candidate rules, benchmark them, and select only those with a measured positive return, making coding agents measurably cheaper over time. A Claude Code plugin that makes coding agents measurably cheaper over time. Most "agent memory" accumulates advice nobody ever verifies. token-warden treats agent memory as an engineering problem: every rule that wants space in an agent's context must prove, on a fixed benchmark, that it saves more tokens than it costs — or it gets evicted. The result is a per-agent memory file containing only rules with measured, positive return. Measured, not vibes — every rule carries a token delta from real benchmark runs Self-funding — rules must save ≥ 2× their own context rent to stay Self-auditing — active rules are re-benchmarked round-robin and evicted when they stop earning Zero session overhead — collection runs in a Stop hook that never blocks or fails your work How it works how-it-works Getting started getting-started Commands commands The benchmark system the-benchmark-system Architecture architecture The agents the-agents Inter-agent approval gate inter-agent-approval-gate-experimental Design invariants design-invariants A recorded demonstration a-recorded-demonstration Testing testing Data layout data-layout Security notes security-notes Roadmap roadmap The optimizer is a four-stage, feed-forward loop. Lessons are extracted from finished sessions and applied to future ones — past work is never re-done. agent session any project, any repo │ │ Stop hook · parses the transcript: │ tokens, tool calls, file re-reads, completion ▼ ┌─────────────────────────────────────┐ │ 1 · COLLECT │ │ one row per session in SQLite │ └─────────────────────────────────────┘ │ │ fires only when a run exceeds the │ agent's rolling p75 token cost ▼ ┌─────────────────────────────────────┐ │ 2 · DISTILL │ │ one haiku call over the waste │ │ stats → 0–2 candidate rules │ └─────────────────────────────────────┘ │ │ candidates wait in SQLite — │ never injected until measured ▼ ┌─────────────────────────────────────┐ │ 3 · BENCH │ │ golden suite on a frozen fixture, │ │ run with vs. without the candidate │ └─────────────────────────────────────┘ │ │ measured delta vs. context rent ▼ ┌─────────────────────────────────────┐ │ 4 · SELECT │ │ keep if savings ≥ 2× rent, else │ │ evict · re-audit the oldest rule │ └─────────────────────────────────────┘ │ ▼ ~/.claude/agent-memory/