{"slug": "60-95-fewer-tokens-in-your-agent-loops-same-answers-meet-headroom", "title": "60–95% fewer tokens in your agent loops, same answers. Meet Headroom.", "summary": "Headroom, an open-source context compression layer, reduces token usage in AI agent loops by 60–95% while preserving answer accuracy. The tool intercepts and compresses tool outputs, logs, and conversation history before they reach the LLM, achieving savings from 65,694 tokens to 5,118 tokens in an SRE debugging session. It is available as a drop-in proxy, library, or MCP server, and supports zero-code integration with agents like Claude Code, Codex, and Cursor.", "body_md": "AI coding agents are expensive — not because models cost too much per token, but because they send too many of them. An SRE debugging session with a raw agent: 65,694 tokens in. With Headroom in the middle: 5,118. Same bug found.\n\n[Headroom](https://github.com/chopratejas/headroom) is a new open-source context compression layer that intercepts everything your agent reads — tool outputs, log dumps, RAG chunks, files, conversation history — and compresses it before the LLM ever sees it. It's local, reversible, and available as a drop-in proxy, a library, or an MCP server.\n\nSavings on real agent workloads:\n\nAccuracy on standard benchmarks (GSM8K, TruthfulQA, SQuAD v2, BFCL) is preserved — some scores actually improve slightly, likely because the model sees cleaner signal.\n\nUnder the hood, Headroom routes content through a stack of specialised compressors:\n\nIt also does **CCR (reversible compression)** — originals are cached locally and the LLM can retrieve them on demand if it needs them. Nothing is destroyed.\n\nThe most interesting deployment path: `headroom proxy --port 8787`\n\n, then point your existing tool at localhost. Zero code changes. Works with any language.\n\nOr even simpler: `headroom wrap claude`\n\nwraps Claude Code, routes its traffic through Headroom automatically. One command, savings start immediately. Same for Codex, Cursor, Aider, Copilot CLI.\n\n\"Library — compress(messages) in Python or TypeScript, inline in any app. Proxy — headroom proxy --port 8787, zero code changes, any language.\"\n\nThere's also a **cross-agent memory** store — shared context across Claude, Codex, and Gemini sessions with auto-dedup — and a `headroom learn`\n\nfeature that mines past failed sessions and writes corrections back to your CLAUDE.md / AGENTS.md.\n\n`pip install \"headroom-ai[all]\"`\n\nthen `headroom wrap claude`\n\n. See the savings in five minutes.`headroom proxy --port 8787`\n\nand point your client at localhost. No code changes needed.`HEADROOM_OUTPUT_SHAPER=1`\n\n— it trims verbose model output too, and on 5× output pricing that adds up fast.Source: [github.com/chopratejas/headroom](https://github.com/chopratejas/headroom)\n\n*✏️ Drafted with KewBot (AI), edited and approved by Drew.*", "url": "https://wpnews.pro/news/60-95-fewer-tokens-in-your-agent-loops-same-answers-meet-headroom", "canonical_source": "https://dev.to/thegatewayguy/60-95-fewer-tokens-in-your-agent-loops-same-answers-meet-headroom-1999", "published_at": "2026-06-20 09:41:35+00:00", "updated_at": "2026-06-20 10:06:54.525712+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "ai-agents", "ai-infrastructure", "mlops"], "entities": ["Headroom", "Claude Code", "Codex", "Cursor", "Aider", "Copilot CLI", "Gemini", "GSM8K"], "alternates": {"html": "https://wpnews.pro/news/60-95-fewer-tokens-in-your-agent-loops-same-answers-meet-headroom", "markdown": "https://wpnews.pro/news/60-95-fewer-tokens-in-your-agent-loops-same-answers-meet-headroom.md", "text": "https://wpnews.pro/news/60-95-fewer-tokens-in-your-agent-loops-same-answers-meet-headroom.txt", "jsonld": "https://wpnews.pro/news/60-95-fewer-tokens-in-your-agent-loops-same-answers-meet-headroom.jsonld"}}