cd /news/ai-agents/headroom · home topics ai-agents article
[ARTICLE · art-30438] src=github.com ↗ pub= topic=ai-agents verified=true sentiment=↑ positive

Headroom

Headroom, a context compression layer for AI agents, reduces token usage by 60–95% by compressing tool outputs, logs, and conversation history before they reach the LLM. The open-source tool offers multiple integration modes including a library, proxy, agent wrapper, and MCP server, and supports reversible compression with local caching.

read8 min views4 publishedJun 17, 2026
  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗
  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║
  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║
  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝
                  The context compression layer for AI agents

60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible

Docs · Install · Proof · Agents · Discord · llms.txt · Enterprise

AI agents / LLMs: read

/llms.txt

here, or fetch the live index / full docs blob. Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

Live: 10,144 → 1,260 tokens — same FATAL found.

Librarycompress(messages)

in Python or TypeScript, inline in any appProxyheadroom proxy --port 8787

, zero code changes, any languageAgent wrapheadroom wrap claude|codex|cursor|aider|copilot

in one commandMCP serverheadroom_compress

,headroom_retrieve

,headroom_stats

for any MCP clientCross-agent memory— shared store across Claude, Codex, Gemini, auto-dedup— mines failed sessions, writes corrections toheadroom learn

CLAUDE.md

/AGENTS.md

Reversible (CCR)— originals are cached for retrieval on demand

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

ContentRouter— detects content type, selects the right compressor** SmartCrusher / CodeCompressor / Kompress-base**— compress JSON, AST, or prose** CacheAligner**— stabilizes prefixes so provider KV caches actually hit** CCR**— stores originals locally; LLM callsheadroom_retrieve

if it needs them

Architecture · CCR reversible compression · Kompress-v2-base model card

pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes

headroom perf

Granular extras: [proxy]

, [mcp]

, [ml]

, [code]

, [memory]

, [relevance]

, [image]

, [agno]

, [langchain]

, [evals]

, [pytorch-mps]

(Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps

). Requires Python 3.10+.

Savings on real agent workloads:

Workload Before After Savings
Code search (100 results) 17,765 1,408 92%
SRE incident debugging 65,694 5,118 92%
GitHub issue triage 54,174 14,761 73%
Codebase exploration 78,502 41,254 47%

Accuracy preserved on standard benchmarks:

Benchmark Category N Baseline Headroom Delta
GSM8K Math 100 0.870 0.870 ±0.000
TruthfulQA Factual 100 0.530 0.560 +0.030
SQuAD v2 QA 100 97%
19% compression
BFCL Tools 100 97%
32% compression

Reproduce: python -m headroom.evals suite --tier 1

· Full benchmarks & methodology

| Agent | headroom wrap | Notes | |---|---|---| | Claude Code | ✅ | --memory · --code-graph | | Codex | ✅ | shares memory with Claude | | Cursor | ✅ | prints config — paste once | | Aider | ✅ | starts proxy + launches | | Copilot CLI | ✅ | starts proxy + launches | | OpenClaw | ✅ | installs as ContextEngine plugin |

Any OpenAI-compatible client works via headroom proxy

. MCP-native: headroom mcp install

.

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as COPILOT_PROVIDER_API_URL=...

during launch.

headroom copilot-auth login

stores a Headroom-specific Copilot OAuth token. This avoids relying on generic GitHub or Copilot CLI tokens that can read Copilot account metadata but may still be rejected by Copilot's token-exchange endpoint.

For GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:

export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com

For GitHub.com Enterprise Cloud URLs such as github.com/enterprises/your-enterprise

, do not set an enterprise-domain override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot API endpoint advertised for the signed-in account.

Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / secret-tool

, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit GITHUB_COPILOT_TOKEN

or GITHUB_COPILOT_GITHUB_TOKEN

rather than relying on host keychain access.

Great fit if you…

  • run AI coding agents daily and want savings without changing your code
  • work across multiple agents and want shared memory
  • need reversible compression — originals are retrievable via CCR within the configured TTL

Skip it if you…

  • only use a single provider's native compaction and don't need cross-agent memory
  • work in a sandboxed environment where local processes can't run

Integrations — drop Headroom into any stack

Your setup Hook in with
Any Python app compress(messages, model=…)
Any TypeScript app await compress(messages, { model })
Anthropic / OpenAI SDK withHeadroom(new Anthropic()) · withHeadroom(new OpenAI())
Vercel AI SDK wrapLanguageModel({ model, middleware: headroomMiddleware() })
LiteLLM litellm.callbacks = [HeadroomCallback()]
LangChain HeadroomChatModel(your_llm)
Agno HeadroomAgnoModel(your_model)
Strands

app.add_middleware(CompressionMiddleware)

SharedContext().put / .get

headroom mcp install

What's inside

SmartCrusher— universal JSON: arrays of dicts, nested objects, mixed types.** CodeCompressor**— AST-aware for Python, JS, Go, Rust, Java, C++.** Kompress-base**— our HuggingFace model, trained on agentic traces.** Image compression**— 40–90% reduction via trained ML router.** CacheAligner**— stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.** IntelligentContext**— score-based context fitting with learned importance.** CCR**— reversible compression; LLM retrieves originals on demand.** Cross-agent memory**— shared store, agent provenance, auto-dedup.** SharedContext**— compressed context passing across multi-agent workflows.— plugin-based failure mining for Claude, Codex, Gemini.headroom learn

Pipeline internals

Headroom exposes one stable request lifecycle across compress()

, the SDK, and the proxy:

Setup

Pre-Start

Post-Start

Input Received

Input Cached

Input Routed

Input Compressed

Input Remembered

Pre-Send

Post-Send

Response Received

Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.Pipeline extensions observe or customize lifecycle stages viaon_pipeline_event(...)

.Compression hooks sit alongside the canonical lifecycle as an additional extension seam.Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under headroom/providers/

so core orchestration stays focused on lifecycle, sequencing, and policy.

CLI/tool slices:headroom/providers/claude

,copilot

,codex

,openclaw

Provider runtime slices:headroom/providers/claude

,gemini

, plus shared backend/runtime dispatch inheadroom/providers/registry.py

Core files stay orchestration-first:wrap.py

,client.py

,cli/proxy.py

, andproxy/server.py

delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.

pip install "headroom-ai[all]"          # Python, everything
npm install headroom-ai                 # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy]

, [mcp]

, [ml]

(Kompress-base), [code]

, [memory]

, [relevance]

, [image]

, [agno]

, [langchain]

, [evals]

, [pytorch-mps]

(Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps

). Requires Python 3.10+.

Using pipx

? Choose a supported interpreter explicitly:

pipx install --python python3.13 "headroom-ai[all]"

Installation guide — Docker tags, persistent service, PowerShell, devcontainers.

If pip install "headroom-ai[all]"

fails with CERTIFICATE_VERIFY_FAILED

(unable to get local issuer certificate

), your network uses SSL inspection — a MITM proxy presenting a company-issued CA. The build backend (maturin

) downloads rustup

over a connection your TLS stack doesn't trust. Install Rust first so the build doesn't fetch it:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && rustup default stable
winget install Rustlang.Rustup && rustup default stable

Restart your shell, then pip install "headroom-ai[all]"

. A prebuilt wheel avoids the Rust build entirely where available: pip install --only-binary headroom-ai headroom-ai

.

Two runtime assets are fetched over TLS; if they are blocked, trust your corporate CA via REQUESTS_CA_BUNDLE

/ SSL_CERT_FILE

/ CURL_CA_BUNDLE

:

— the ONNX Runtime for the Rust core. Alternatively pre-provide it withcdn.pyke.io

ORT_STRATEGY=system

andORT_LIB_LOCATION=/path/to/onnxruntime

.— thehuggingface.co

kompress-base

compression model. Pre-download it and run withHF_HUB_OFFLINE=1

, or setHF_ENDPOINT

to a trusted mirror.

Running with compression disabled (pure gateway) requires neither asset.

headroom learn

— mines failed sessions, writes corrections to CLAUDE.md

/ AGENTS.md

/ GEMINI.md

.

Start here Go deeper

ArchitectureProxyHow compression worksMCP toolsCCR — reversible compressionMemoryCache optimizationFailure learningBenchmarksConfigurationLimitationsHeadroom runs locally, covers every content type, works with every major framework, and is reversible.

Scope Deploy Local Reversible
Headroom
All context — tools, RAG, logs, files, history Proxy · library · middleware · MCP Yes Yes

lean-ctxCompresr,Token Co.

Attribution.Headroom ships with the excellent[RTK]binary for shell-output rewriting —git show --short

, scopedls

, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use[lean-ctx]as the selected CLI context tool; setHEADROOM_CONTEXT_TOOL=lean-ctx

before runningheadroom wrap ...

.

git clone https://github.com/chopratejas/headroom.git && cd headroom
uv sync --extra dev && uv run pytest

Devcontainers in .devcontainer/

(default + memory-stack

with Qdrant & Neo4j). See CONTRIBUTING.md.

— questions, feedback, war stories.Discord— the model behind our text compression.Kompress-v2-base on HuggingFace

Apache 2.0 — see LICENSE.

── more in #ai-agents 4 stories · sorted by recency
── more on @headroom 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/headroom] indexed:0 read:8min 2026-06-17 ·