{"slug": "headroom", "title": "Headroom", "summary": "Headroom, a context compression layer for AI agents, reduces token usage by 60–95% by compressing tool outputs, logs, and conversation history before they reach the LLM. The open-source tool offers multiple integration modes including a library, proxy, agent wrapper, and MCP server, and supports reversible compression with local caching.", "body_md": "\n\n```\n  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗\n  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║\n  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║\n  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║\n  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║\n  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝\n                  The context compression layer for AI agents\n```\n\n**60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible**\n\n[Docs](https://headroom-docs.vercel.app/docs) ·\n[Install](#get-started-60-seconds) ·\n[Proof](#proof) ·\n[Agents](#agent-compatibility-matrix) ·\n[Discord](https://discord.gg/yRmaUNpsPJ) ·\n[llms.txt](/chopratejas/headroom/blob/main/llms.txt) ·\n[Enterprise](/chopratejas/headroom/blob/main/ENTERPRISE.md)\n\nAI agents / LLMs: read\n\n`/llms.txt`\n\nhere, or fetch the live index / full docs blob.\nHeadroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.\n\n[\n](/chopratejas/headroom/blob/main/HeadroomDemo-Fast.gif)Live: 10,144 → 1,260 tokens — same FATAL found.\n\n**Library**—`compress(messages)`\n\nin Python or TypeScript, inline in any app**Proxy**—`headroom proxy --port 8787`\n\n, zero code changes, any language**Agent wrap**—`headroom wrap claude|codex|cursor|aider|copilot`\n\nin one command**MCP server**—`headroom_compress`\n\n,`headroom_retrieve`\n\n,`headroom_stats`\n\nfor any MCP client**Cross-agent memory**— shared store across Claude, Codex, Gemini, auto-dedup— mines failed sessions, writes corrections to`headroom learn`\n\n`CLAUDE.md`\n\n/`AGENTS.md`\n\n**Reversible (CCR)**— originals are cached for retrieval on demand\n\n```\n Your agent / app\n   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)\n        │   prompts · tool outputs · logs · RAG results · files\n        ▼\n    ┌────────────────────────────────────────────────────┐\n    │  Headroom   (runs locally — your data stays here)  │\n    │  ────────────────────────────────────────────────  │\n    │  CacheAligner  →  ContentRouter  →  CCR            │\n    │                    ├─ SmartCrusher   (JSON)        │\n    │                    ├─ CodeCompressor (AST)         │\n    │                    └─ Kompress-base  (text, HF)    │\n    │                                                    │\n    │  Cross-agent memory  ·  headroom learn  ·  MCP     │\n    └────────────────────────────────────────────────────┘\n        │   compressed prompt  +  retrieval tool\n        ▼\n LLM provider  (Anthropic · OpenAI · Bedrock · …)\n```\n\n**ContentRouter**— detects content type, selects the right compressor** SmartCrusher / CodeCompressor / Kompress-base**— compress JSON, AST, or prose** CacheAligner**— stabilizes prefixes so provider KV caches actually hit** CCR**— stores originals locally; LLM calls`headroom_retrieve`\n\nif it needs them\n\n→ [Architecture](https://headroom-docs.vercel.app/docs/architecture) · [CCR reversible compression](https://headroom-docs.vercel.app/docs/ccr) · [Kompress-v2-base model card](https://huggingface.co/chopratejas/kompress-v2-base)\n\n```\n# 1 — Install\npip install \"headroom-ai[all]\"          # Python\nnpm install headroom-ai                 # Node / TypeScript\n\n# 2 — Pick your mode\nheadroom wrap claude                    # wrap a coding agent\nheadroom proxy --port 8787              # drop-in proxy, zero code changes\n# or: from headroom import compress      # inline library\n\n# 3 — See the savings\nheadroom perf\n```\n\nGranular extras: `[proxy]`\n\n, `[mcp]`\n\n, `[ml]`\n\n, `[code]`\n\n, `[memory]`\n\n, `[relevance]`\n\n, `[image]`\n\n, `[agno]`\n\n, `[langchain]`\n\n, `[evals]`\n\n, `[pytorch-mps]`\n\n(Apple-GPU memory-embedder offload — set `HEADROOM_EMBEDDER_RUNTIME=pytorch_mps`\n\n). Requires **Python 3.10+**.\n\n**Savings on real agent workloads:**\n\n| Workload | Before | After | Savings |\n|---|---|---|---|\n| Code search (100 results) | 17,765 | 1,408 | 92% |\n| SRE incident debugging | 65,694 | 5,118 | 92% |\n| GitHub issue triage | 54,174 | 14,761 | 73% |\n| Codebase exploration | 78,502 | 41,254 | 47% |\n\n**Accuracy preserved on standard benchmarks:**\n\n| Benchmark | Category | N | Baseline | Headroom | Delta |\n|---|---|---|---|---|---|\n| GSM8K | Math | 100 | 0.870 | 0.870 | ±0.000 |\n| TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |\n| SQuAD v2 | QA | 100 | — | 97% |\n19% compression |\n| BFCL | Tools | 100 | — | 97% |\n32% compression |\n\nReproduce: `python -m headroom.evals suite --tier 1`\n\n· [Full benchmarks & methodology](https://headroom-docs.vercel.app/docs/benchmarks)\n\n| Agent | `headroom wrap` |\nNotes |\n|---|---|---|\n| Claude Code | ✅ | `--memory` · `--code-graph` |\n| Codex | ✅ | shares memory with Claude |\n| Cursor | ✅ | prints config — paste once |\n| Aider | ✅ | starts proxy + launches |\n| Copilot CLI | ✅ | starts proxy + launches |\n| OpenClaw | ✅ | installs as ContextEngine plugin |\n\nAny OpenAI-compatible client works via `headroom proxy`\n\n. MCP-native: `headroom mcp install`\n\n.\n\nHeadroom can route GitHub Copilot CLI subscription traffic through the local proxy:\n\n```\nheadroom copilot-auth login\nheadroom wrap copilot --subscription -- --model gpt-4o\n```\n\nThis lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as `COPILOT_PROVIDER_API_URL=...`\n\nduring launch.\n\n`headroom copilot-auth login`\n\nstores a Headroom-specific Copilot OAuth token.\nThis avoids relying on generic GitHub or Copilot CLI tokens that can read\nCopilot account metadata but may still be rejected by Copilot's token-exchange\nendpoint.\n\nFor GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:\n\n```\nexport GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com\n```\n\nFor GitHub.com Enterprise Cloud URLs such as\n`github.com/enterprises/your-enterprise`\n\n, do not set an enterprise-domain\noverride. Headroom uses GitHub's normal token-exchange endpoint and the Copilot\nAPI endpoint advertised for the signed-in account.\n\nPlatform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / `secret-tool`\n\n, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit `GITHUB_COPILOT_TOKEN`\n\nor `GITHUB_COPILOT_GITHUB_TOKEN`\n\nrather than relying on host keychain access.\n\n**Great fit if you…**\n\n- run AI coding agents daily and want savings without changing your code\n- work across multiple agents and want shared memory\n- need reversible compression — originals are retrievable via CCR within the configured TTL\n\n**Skip it if you…**\n\n- only use a single provider's native compaction and don't need cross-agent memory\n- work in a sandboxed environment where local processes can't run\n\n**Integrations — drop Headroom into any stack**\n\n| Your setup | Hook in with |\n|---|---|\n| Any Python app | `compress(messages, model=…)` |\n| Any TypeScript app | `await compress(messages, { model })` |\n| Anthropic / OpenAI SDK | `withHeadroom(new Anthropic())` · `withHeadroom(new OpenAI())` |\n| Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` |\n| LiteLLM | `litellm.callbacks = [HeadroomCallback()]` |\n| LangChain | `HeadroomChatModel(your_llm)` |\n| Agno | `HeadroomAgnoModel(your_model)` |\n| Strands |\n|\n\n`app.add_middleware(CompressionMiddleware)`\n\n`SharedContext().put / .get`\n\n`headroom mcp install`\n\n**What's inside**\n\n**SmartCrusher**— universal JSON: arrays of dicts, nested objects, mixed types.** CodeCompressor**— AST-aware for Python, JS, Go, Rust, Java, C++.** Kompress-base**— our HuggingFace model, trained on agentic traces.** Image compression**— 40–90% reduction via trained ML router.** CacheAligner**— stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.** IntelligentContext**— score-based context fitting with learned importance.** CCR**— reversible compression; LLM retrieves originals on demand.** Cross-agent memory**— shared store, agent provenance, auto-dedup.** SharedContext**— compressed context passing across multi-agent workflows.— plugin-based failure mining for Claude, Codex, Gemini.`headroom learn`\n\n**Pipeline internals**\n\nHeadroom exposes one stable request lifecycle across `compress()`\n\n, the SDK, and the proxy:\n\n`Setup`\n\n→ `Pre-Start`\n\n→ `Post-Start`\n\n→ `Input Received`\n\n→ `Input Cached`\n\n→ `Input Routed`\n\n→ `Input Compressed`\n\n→ `Input Remembered`\n\n→ `Pre-Send`\n\n→ `Post-Send`\n\n→ `Response Received`\n\n**Transforms** do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.**Pipeline extensions** observe or customize lifecycle stages via`on_pipeline_event(...)`\n\n.**Compression hooks** sit alongside the canonical lifecycle as an additional extension seam.**Proxy extensions** remain the server/app integration seam for ASGI middleware, routes, and startup policy.\n\nProvider and tool-specific behavior lives under `headroom/providers/`\n\nso core orchestration stays focused on lifecycle, sequencing, and policy.\n\n**CLI/tool slices**:`headroom/providers/claude`\n\n,`copilot`\n\n,`codex`\n\n,`openclaw`\n\n**Provider runtime slices**:`headroom/providers/claude`\n\n,`gemini`\n\n, plus shared backend/runtime dispatch in`headroom/providers/registry.py`\n\n**Core files stay orchestration-first**:`wrap.py`\n\n,`client.py`\n\n,`cli/proxy.py`\n\n, and`proxy/server.py`\n\ndelegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.\n\n```\npip install \"headroom-ai[all]\"          # Python, everything\nnpm install headroom-ai                 # TypeScript / Node\ndocker pull ghcr.io/chopratejas/headroom:latest\n```\n\nGranular extras: `[proxy]`\n\n, `[mcp]`\n\n, `[ml]`\n\n(Kompress-base), `[code]`\n\n, `[memory]`\n\n, `[relevance]`\n\n, `[image]`\n\n, `[agno]`\n\n, `[langchain]`\n\n, `[evals]`\n\n, `[pytorch-mps]`\n\n(Apple-GPU memory-embedder offload — set `HEADROOM_EMBEDDER_RUNTIME=pytorch_mps`\n\n). Requires **Python 3.10+**.\n\nUsing `pipx`\n\n? Choose a supported interpreter explicitly:\n\n```\npipx install --python python3.13 \"headroom-ai[all]\"\n```\n\n→ [Installation guide](https://headroom-docs.vercel.app/docs/installation) — Docker tags, persistent service, PowerShell, devcontainers.\n\nIf `pip install \"headroom-ai[all]\"`\n\nfails with `CERTIFICATE_VERIFY_FAILED`\n\n(`unable to get local issuer certificate`\n\n), your network uses **SSL inspection** — a MITM\nproxy presenting a company-issued CA. The build backend (`maturin`\n\n) downloads `rustup`\n\nover a\nconnection your TLS stack doesn't trust. **Install Rust first** so the build doesn't fetch it:\n\n```\n# macOS / Linux\ncurl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && rustup default stable\n# Windows\nwinget install Rustlang.Rustup && rustup default stable\n```\n\nRestart your shell, then `pip install \"headroom-ai[all]\"`\n\n. A prebuilt wheel avoids the Rust\nbuild entirely where available: `pip install --only-binary headroom-ai headroom-ai`\n\n.\n\nTwo runtime assets are fetched over TLS; if they are blocked, trust your corporate CA via\n`REQUESTS_CA_BUNDLE`\n\n/ `SSL_CERT_FILE`\n\n/ `CURL_CA_BUNDLE`\n\n:\n\n— the ONNX Runtime for the Rust core. Alternatively pre-provide it with`cdn.pyke.io`\n\n`ORT_STRATEGY=system`\n\nand`ORT_LIB_LOCATION=/path/to/onnxruntime`\n\n.— the`huggingface.co`\n\n`kompress-base`\n\ncompression model. Pre-download it and run with`HF_HUB_OFFLINE=1`\n\n, or set`HF_ENDPOINT`\n\nto a trusted mirror.\n\nRunning with compression disabled (pure gateway) requires neither asset.\n\n`headroom learn`\n\n— mines failed sessions, writes corrections to `CLAUDE.md`\n\n/ `AGENTS.md`\n\n/ `GEMINI.md`\n\n.\n\n| Start here | Go deeper |\n|---|---|\n|\n\n[Architecture](https://headroom-docs.vercel.app/docs/architecture)[Proxy](https://headroom-docs.vercel.app/docs/proxy)[How compression works](https://headroom-docs.vercel.app/docs/how-compression-works)[MCP tools](https://headroom-docs.vercel.app/docs/mcp)[CCR — reversible compression](https://headroom-docs.vercel.app/docs/ccr)[Memory](https://headroom-docs.vercel.app/docs/memory)[Cache optimization](https://headroom-docs.vercel.app/docs/cache-optimization)[Failure learning](https://headroom-docs.vercel.app/docs/failure-learning)[Benchmarks](https://headroom-docs.vercel.app/docs/benchmarks)[Configuration](https://headroom-docs.vercel.app/docs/configuration)[Limitations](https://headroom-docs.vercel.app/docs/limitations)Headroom runs **locally**, covers **every** content type, works with every major framework, and is **reversible**.\n\n| Scope | Deploy | Local | Reversible | |\n|---|---|---|---|---|\nHeadroom |\nAll context — tools, RAG, logs, files, history | Proxy · library · middleware · MCP | Yes | Yes |\n|\n\n[lean-ctx](https://github.com/yvgude/lean-ctx)[Compresr](https://compresr.ai),[Token Co.](https://thetokencompany.ai)\n\nAttribution.Headroom ships with the excellent[RTK]binary for shell-output rewriting —`git show --short`\n\n, scoped`ls`\n\n, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use[lean-ctx]as the selected CLI context tool; set`HEADROOM_CONTEXT_TOOL=lean-ctx`\n\nbefore running`headroom wrap ...`\n\n.\n\n```\ngit clone https://github.com/chopratejas/headroom.git && cd headroom\nuv sync --extra dev && uv run pytest\n```\n\nDevcontainers in `.devcontainer/`\n\n(default + `memory-stack`\n\nwith Qdrant & Neo4j). See [CONTRIBUTING.md](/chopratejas/headroom/blob/main/CONTRIBUTING.md).\n\n— questions, feedback, war stories.[Discord](https://discord.gg/yRmaUNpsPJ)— the model behind our text compression.[Kompress-v2-base on HuggingFace](https://huggingface.co/chopratejas/kompress-v2-base)\n\nApache 2.0 — see [LICENSE](/chopratejas/headroom/blob/main/LICENSE).", "url": "https://wpnews.pro/news/headroom", "canonical_source": "https://github.com/chopratejas/headroom", "published_at": "2026-06-17 02:57:28+00:00", "updated_at": "2026-06-17 03:22:32.445217+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "large-language-models", "developer-tools"], "entities": ["Headroom", "Anthropic", "OpenAI", "Claude Code", "Cursor", "Codex", "LangChain", "Agno"], "alternates": {"html": "https://wpnews.pro/news/headroom", "markdown": "https://wpnews.pro/news/headroom.md", "text": "https://wpnews.pro/news/headroom.txt", "jsonld": "https://wpnews.pro/news/headroom.jsonld"}}