██╗ ██╗███████╗ █████╗ ██████╗ ██████╗ ██████╗ ██████╗ ███╗ ███╗
██║ ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
███████║█████╗ ███████║██║ ██║██████╔╝██║ ██║██║ ██║██╔████╔██║
██╔══██║██╔══╝ ██╔══██║██║ ██║██╔══██╗██║ ██║██║ ██║██║╚██╔╝██║
██║ ██║███████╗██║ ██║██████╔╝██║ ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝
The context compression layer for AI agents
60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible
Docs · Install · Proof · Agents · Discord · llms.txt · Enterprise
AI agents / LLMs: read
/llms.txt
here, or fetch the live index / full docs blob. Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.
Live: 10,144 → 1,260 tokens — same FATAL found.
Library—compress(messages)
in Python or TypeScript, inline in any appProxy—headroom proxy --port 8787
, zero code changes, any languageAgent wrap—headroom wrap claude|codex|cursor|aider|copilot
in one commandMCP server—headroom_compress
,headroom_retrieve
,headroom_stats
for any MCP clientCross-agent memory— shared store across Claude, Codex, Gemini, auto-dedup— mines failed sessions, writes corrections toheadroom learn
CLAUDE.md
/AGENTS.md
Reversible (CCR)— originals are cached for retrieval on demand
Your agent / app
(Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
│ prompts · tool outputs · logs · RAG results · files
▼
┌────────────────────────────────────────────────────┐
│ Headroom (runs locally — your data stays here) │
│ ──────────────────────────────────────────────── │
│ CacheAligner → ContentRouter → CCR │
│ ├─ SmartCrusher (JSON) │
│ ├─ CodeCompressor (AST) │
│ └─ Kompress-base (text, HF) │
│ │
│ Cross-agent memory · headroom learn · MCP │
└────────────────────────────────────────────────────┘
│ compressed prompt + retrieval tool
▼
LLM provider (Anthropic · OpenAI · Bedrock · …)
ContentRouter— detects content type, selects the right compressor** SmartCrusher / CodeCompressor / Kompress-base**— compress JSON, AST, or prose** CacheAligner**— stabilizes prefixes so provider KV caches actually hit** CCR**— stores originals locally; LLM callsheadroom_retrieve
if it needs them
→ Architecture · CCR reversible compression · Kompress-v2-base model card
pip install "headroom-ai[all]" # Python
npm install headroom-ai # Node / TypeScript
headroom wrap claude # wrap a coding agent
headroom proxy --port 8787 # drop-in proxy, zero code changes
headroom perf
Granular extras: [proxy]
, [mcp]
, [ml]
, [code]
, [memory]
, [relevance]
, [image]
, [agno]
, [langchain]
, [evals]
, [pytorch-mps]
(Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps
). Requires Python 3.10+.
Savings on real agent workloads:
| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |
Accuracy preserved on standard benchmarks:
| Benchmark | Category | N | Baseline | Headroom | Delta |
|---|---|---|---|---|---|
| GSM8K | Math | 100 | 0.870 | 0.870 | ±0.000 |
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |
| SQuAD v2 | QA | 100 | — | 97% | |
| 19% compression | |||||
| BFCL | Tools | 100 | — | 97% | |
| 32% compression |
Reproduce: python -m headroom.evals suite --tier 1
· Full benchmarks & methodology
| Agent | headroom wrap |
Notes |
|---|---|---|
| Claude Code | ✅ | --memory · --code-graph |
| Codex | ✅ | shares memory with Claude |
| Cursor | ✅ | prints config — paste once |
| Aider | ✅ | starts proxy + launches |
| Copilot CLI | ✅ | starts proxy + launches |
| OpenClaw | ✅ | installs as ContextEngine plugin |
Any OpenAI-compatible client works via headroom proxy
. MCP-native: headroom mcp install
.
Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:
headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o
This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as COPILOT_PROVIDER_API_URL=...
during launch.
headroom copilot-auth login
stores a Headroom-specific Copilot OAuth token. This avoids relying on generic GitHub or Copilot CLI tokens that can read Copilot account metadata but may still be rejected by Copilot's token-exchange endpoint.
For GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:
export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com
For GitHub.com Enterprise Cloud URLs such as
github.com/enterprises/your-enterprise
, do not set an enterprise-domain override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot API endpoint advertised for the signed-in account.
Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / secret-tool
, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit GITHUB_COPILOT_TOKEN
or GITHUB_COPILOT_GITHUB_TOKEN
rather than relying on host keychain access.
Great fit if you…
- run AI coding agents daily and want savings without changing your code
- work across multiple agents and want shared memory
- need reversible compression — originals are retrievable via CCR within the configured TTL
Skip it if you…
- only use a single provider's native compaction and don't need cross-agent memory
- work in a sandboxed environment where local processes can't run
Integrations — drop Headroom into any stack
| Your setup | Hook in with |
|---|---|
| Any Python app | compress(messages, model=…) |
| Any TypeScript app | await compress(messages, { model }) |
| Anthropic / OpenAI SDK | withHeadroom(new Anthropic()) · withHeadroom(new OpenAI()) |
| Vercel AI SDK | wrapLanguageModel({ model, middleware: headroomMiddleware() }) |
| LiteLLM | litellm.callbacks = [HeadroomCallback()] |
| LangChain | HeadroomChatModel(your_llm) |
| Agno | HeadroomAgnoModel(your_model) |
| Strands | |
app.add_middleware(CompressionMiddleware)
SharedContext().put / .get
headroom mcp install
What's inside
SmartCrusher— universal JSON: arrays of dicts, nested objects, mixed types.** CodeCompressor**— AST-aware for Python, JS, Go, Rust, Java, C++.** Kompress-base**— our HuggingFace model, trained on agentic traces.** Image compression**— 40–90% reduction via trained ML router.** CacheAligner**— stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.** IntelligentContext**— score-based context fitting with learned importance.** CCR**— reversible compression; LLM retrieves originals on demand.** Cross-agent memory**— shared store, agent provenance, auto-dedup.** SharedContext**— compressed context passing across multi-agent workflows.— plugin-based failure mining for Claude, Codex, Gemini.headroom learn
Pipeline internals
Headroom exposes one stable request lifecycle across compress()
, the SDK, and the proxy:
Setup
→ Pre-Start
→ Post-Start
→ Input Received
→ Input Cached
→ Input Routed
→ Input Compressed
→ Input Remembered
→ Pre-Send
→ Post-Send
→ Response Received
Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.Pipeline extensions observe or customize lifecycle stages viaon_pipeline_event(...)
.Compression hooks sit alongside the canonical lifecycle as an additional extension seam.Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.
Provider and tool-specific behavior lives under headroom/providers/
so core orchestration stays focused on lifecycle, sequencing, and policy.
CLI/tool slices:headroom/providers/claude
,copilot
,codex
,openclaw
Provider runtime slices:headroom/providers/claude
,gemini
, plus shared backend/runtime dispatch inheadroom/providers/registry.py
Core files stay orchestration-first:wrap.py
,client.py
,cli/proxy.py
, andproxy/server.py
delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.
pip install "headroom-ai[all]" # Python, everything
npm install headroom-ai # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
Granular extras: [proxy]
, [mcp]
, [ml]
(Kompress-base), [code]
, [memory]
, [relevance]
, [image]
, [agno]
, [langchain]
, [evals]
, [pytorch-mps]
(Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps
). Requires Python 3.10+.
Using pipx
? Choose a supported interpreter explicitly:
pipx install --python python3.13 "headroom-ai[all]"
→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.
If pip install "headroom-ai[all]"
fails with CERTIFICATE_VERIFY_FAILED
(unable to get local issuer certificate
), your network uses SSL inspection — a MITM
proxy presenting a company-issued CA. The build backend (maturin
) downloads rustup
over a connection your TLS stack doesn't trust. Install Rust first so the build doesn't fetch it:
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && rustup default stable
winget install Rustlang.Rustup && rustup default stable
Restart your shell, then pip install "headroom-ai[all]"
. A prebuilt wheel avoids the Rust
build entirely where available: pip install --only-binary headroom-ai headroom-ai
.
Two runtime assets are fetched over TLS; if they are blocked, trust your corporate CA via
REQUESTS_CA_BUNDLE
/ SSL_CERT_FILE
/ CURL_CA_BUNDLE
:
— the ONNX Runtime for the Rust core. Alternatively pre-provide it withcdn.pyke.io
ORT_STRATEGY=system
andORT_LIB_LOCATION=/path/to/onnxruntime
.— thehuggingface.co
kompress-base
compression model. Pre-download it and run withHF_HUB_OFFLINE=1
, or setHF_ENDPOINT
to a trusted mirror.
Running with compression disabled (pure gateway) requires neither asset.
headroom learn
— mines failed sessions, writes corrections to CLAUDE.md
/ AGENTS.md
/ GEMINI.md
.
| Start here | Go deeper |
|---|---|
ArchitectureProxyHow compression worksMCP toolsCCR — reversible compressionMemoryCache optimizationFailure learningBenchmarksConfigurationLimitationsHeadroom runs locally, covers every content type, works with every major framework, and is reversible.
| Scope | Deploy | Local | Reversible | |
|---|---|---|---|---|
| Headroom | ||||
| All context — tools, RAG, logs, files, history | Proxy · library · middleware · MCP | Yes | Yes | |
Attribution.Headroom ships with the excellent[RTK]binary for shell-output rewriting —git show --short
, scopedls
, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use[lean-ctx]as the selected CLI context tool; setHEADROOM_CONTEXT_TOOL=lean-ctx
before runningheadroom wrap ...
.
git clone https://github.com/chopratejas/headroom.git && cd headroom
uv sync --extra dev && uv run pytest
Devcontainers in .devcontainer/
(default + memory-stack
with Qdrant & Neo4j). See CONTRIBUTING.md.
— questions, feedback, war stories.Discord— the model behind our text compression.Kompress-v2-base on HuggingFace
Apache 2.0 — see LICENSE.