# Headroom

> Source: <https://github.com/chopratejas/headroom>
> Published: 2026-06-17 02:57:28+00:00



```
  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗
  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║
  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║
  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝
                  The context compression layer for AI agents
```

**60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible**

[Docs](https://headroom-docs.vercel.app/docs) ·
[Install](#get-started-60-seconds) ·
[Proof](#proof) ·
[Agents](#agent-compatibility-matrix) ·
[Discord](https://discord.gg/yRmaUNpsPJ) ·
[llms.txt](/chopratejas/headroom/blob/main/llms.txt) ·
[Enterprise](/chopratejas/headroom/blob/main/ENTERPRISE.md)

AI agents / LLMs: read

`/llms.txt`

here, or fetch the live index / full docs blob.
Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

[
](/chopratejas/headroom/blob/main/HeadroomDemo-Fast.gif)Live: 10,144 → 1,260 tokens — same FATAL found.

**Library**—`compress(messages)`

in Python or TypeScript, inline in any app**Proxy**—`headroom proxy --port 8787`

, zero code changes, any language**Agent wrap**—`headroom wrap claude|codex|cursor|aider|copilot`

in one command**MCP server**—`headroom_compress`

,`headroom_retrieve`

,`headroom_stats`

for any MCP client**Cross-agent memory**— shared store across Claude, Codex, Gemini, auto-dedup— mines failed sessions, writes corrections to`headroom learn`

`CLAUDE.md`

/`AGENTS.md`

**Reversible (CCR)**— originals are cached for retrieval on demand

```
 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)
```

**ContentRouter**— detects content type, selects the right compressor** SmartCrusher / CodeCompressor / Kompress-base**— compress JSON, AST, or prose** CacheAligner**— stabilizes prefixes so provider KV caches actually hit** CCR**— stores originals locally; LLM calls`headroom_retrieve`

if it needs them

→ [Architecture](https://headroom-docs.vercel.app/docs/architecture) · [CCR reversible compression](https://headroom-docs.vercel.app/docs/ccr) · [Kompress-v2-base model card](https://huggingface.co/chopratejas/kompress-v2-base)

```
# 1 — Install
pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

# 2 — Pick your mode
headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes
# or: from headroom import compress      # inline library

# 3 — See the savings
headroom perf
```

Granular extras: `[proxy]`

, `[mcp]`

, `[ml]`

, `[code]`

, `[memory]`

, `[relevance]`

, `[image]`

, `[agno]`

, `[langchain]`

, `[evals]`

, `[pytorch-mps]`

(Apple-GPU memory-embedder offload — set `HEADROOM_EMBEDDER_RUNTIME=pytorch_mps`

). Requires **Python 3.10+**.

**Savings on real agent workloads:**

| Workload | Before | After | Savings |
|---|---|---|---|
| Code search (100 results) | 17,765 | 1,408 | 92% |
| SRE incident debugging | 65,694 | 5,118 | 92% |
| GitHub issue triage | 54,174 | 14,761 | 73% |
| Codebase exploration | 78,502 | 41,254 | 47% |

**Accuracy preserved on standard benchmarks:**

| Benchmark | Category | N | Baseline | Headroom | Delta |
|---|---|---|---|---|---|
| GSM8K | Math | 100 | 0.870 | 0.870 | ±0.000 |
| TruthfulQA | Factual | 100 | 0.530 | 0.560 | +0.030 |
| SQuAD v2 | QA | 100 | — | 97% |
19% compression |
| BFCL | Tools | 100 | — | 97% |
32% compression |

Reproduce: `python -m headroom.evals suite --tier 1`

· [Full benchmarks & methodology](https://headroom-docs.vercel.app/docs/benchmarks)

| Agent | `headroom wrap` |
Notes |
|---|---|---|
| Claude Code | ✅ | `--memory` · `--code-graph` |
| Codex | ✅ | shares memory with Claude |
| Cursor | ✅ | prints config — paste once |
| Aider | ✅ | starts proxy + launches |
| Copilot CLI | ✅ | starts proxy + launches |
| OpenClaw | ✅ | installs as ContextEngine plugin |

Any OpenAI-compatible client works via `headroom proxy`

. MCP-native: `headroom mcp install`

.

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

```
headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o
```

This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as `COPILOT_PROVIDER_API_URL=...`

during launch.

`headroom copilot-auth login`

stores a Headroom-specific Copilot OAuth token.
This avoids relying on generic GitHub or Copilot CLI tokens that can read
Copilot account metadata but may still be rejected by Copilot's token-exchange
endpoint.

For GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:

```
export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com
```

For GitHub.com Enterprise Cloud URLs such as
`github.com/enterprises/your-enterprise`

, do not set an enterprise-domain
override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot
API endpoint advertised for the signed-in account.

Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / `secret-tool`

, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit `GITHUB_COPILOT_TOKEN`

or `GITHUB_COPILOT_GITHUB_TOKEN`

rather than relying on host keychain access.

**Great fit if you…**

- run AI coding agents daily and want savings without changing your code
- work across multiple agents and want shared memory
- need reversible compression — originals are retrievable via CCR within the configured TTL

**Skip it if you…**

- only use a single provider's native compaction and don't need cross-agent memory
- work in a sandboxed environment where local processes can't run

**Integrations — drop Headroom into any stack**

| Your setup | Hook in with |
|---|---|
| Any Python app | `compress(messages, model=…)` |
| Any TypeScript app | `await compress(messages, { model })` |
| Anthropic / OpenAI SDK | `withHeadroom(new Anthropic())` · `withHeadroom(new OpenAI())` |
| Vercel AI SDK | `wrapLanguageModel({ model, middleware: headroomMiddleware() })` |
| LiteLLM | `litellm.callbacks = [HeadroomCallback()]` |
| LangChain | `HeadroomChatModel(your_llm)` |
| Agno | `HeadroomAgnoModel(your_model)` |
| Strands |
|

`app.add_middleware(CompressionMiddleware)`

`SharedContext().put / .get`

`headroom mcp install`

**What's inside**

**SmartCrusher**— universal JSON: arrays of dicts, nested objects, mixed types.** CodeCompressor**— AST-aware for Python, JS, Go, Rust, Java, C++.** Kompress-base**— our HuggingFace model, trained on agentic traces.** Image compression**— 40–90% reduction via trained ML router.** CacheAligner**— stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.** IntelligentContext**— score-based context fitting with learned importance.** CCR**— reversible compression; LLM retrieves originals on demand.** Cross-agent memory**— shared store, agent provenance, auto-dedup.** SharedContext**— compressed context passing across multi-agent workflows.— plugin-based failure mining for Claude, Codex, Gemini.`headroom learn`

**Pipeline internals**

Headroom exposes one stable request lifecycle across `compress()`

, the SDK, and the proxy:

`Setup`

→ `Pre-Start`

→ `Post-Start`

→ `Input Received`

→ `Input Cached`

→ `Input Routed`

→ `Input Compressed`

→ `Input Remembered`

→ `Pre-Send`

→ `Post-Send`

→ `Response Received`

**Transforms** do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.**Pipeline extensions** observe or customize lifecycle stages via`on_pipeline_event(...)`

.**Compression hooks** sit alongside the canonical lifecycle as an additional extension seam.**Proxy extensions** remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under `headroom/providers/`

so core orchestration stays focused on lifecycle, sequencing, and policy.

**CLI/tool slices**:`headroom/providers/claude`

,`copilot`

,`codex`

,`openclaw`

**Provider runtime slices**:`headroom/providers/claude`

,`gemini`

, plus shared backend/runtime dispatch in`headroom/providers/registry.py`

**Core files stay orchestration-first**:`wrap.py`

,`client.py`

,`cli/proxy.py`

, and`proxy/server.py`

delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.

```
pip install "headroom-ai[all]"          # Python, everything
npm install headroom-ai                 # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest
```

Granular extras: `[proxy]`

, `[mcp]`

, `[ml]`

(Kompress-base), `[code]`

, `[memory]`

, `[relevance]`

, `[image]`

, `[agno]`

, `[langchain]`

, `[evals]`

, `[pytorch-mps]`

(Apple-GPU memory-embedder offload — set `HEADROOM_EMBEDDER_RUNTIME=pytorch_mps`

). Requires **Python 3.10+**.

Using `pipx`

? Choose a supported interpreter explicitly:

```
pipx install --python python3.13 "headroom-ai[all]"
```

→ [Installation guide](https://headroom-docs.vercel.app/docs/installation) — Docker tags, persistent service, PowerShell, devcontainers.

If `pip install "headroom-ai[all]"`

fails with `CERTIFICATE_VERIFY_FAILED`

(`unable to get local issuer certificate`

), your network uses **SSL inspection** — a MITM
proxy presenting a company-issued CA. The build backend (`maturin`

) downloads `rustup`

over a
connection your TLS stack doesn't trust. **Install Rust first** so the build doesn't fetch it:

```
# macOS / Linux
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && rustup default stable
# Windows
winget install Rustlang.Rustup && rustup default stable
```

Restart your shell, then `pip install "headroom-ai[all]"`

. A prebuilt wheel avoids the Rust
build entirely where available: `pip install --only-binary headroom-ai headroom-ai`

.

Two runtime assets are fetched over TLS; if they are blocked, trust your corporate CA via
`REQUESTS_CA_BUNDLE`

/ `SSL_CERT_FILE`

/ `CURL_CA_BUNDLE`

:

— the ONNX Runtime for the Rust core. Alternatively pre-provide it with`cdn.pyke.io`

`ORT_STRATEGY=system`

and`ORT_LIB_LOCATION=/path/to/onnxruntime`

.— the`huggingface.co`

`kompress-base`

compression model. Pre-download it and run with`HF_HUB_OFFLINE=1`

, or set`HF_ENDPOINT`

to a trusted mirror.

Running with compression disabled (pure gateway) requires neither asset.

`headroom learn`

— mines failed sessions, writes corrections to `CLAUDE.md`

/ `AGENTS.md`

/ `GEMINI.md`

.

| Start here | Go deeper |
|---|---|
|

[Architecture](https://headroom-docs.vercel.app/docs/architecture)[Proxy](https://headroom-docs.vercel.app/docs/proxy)[How compression works](https://headroom-docs.vercel.app/docs/how-compression-works)[MCP tools](https://headroom-docs.vercel.app/docs/mcp)[CCR — reversible compression](https://headroom-docs.vercel.app/docs/ccr)[Memory](https://headroom-docs.vercel.app/docs/memory)[Cache optimization](https://headroom-docs.vercel.app/docs/cache-optimization)[Failure learning](https://headroom-docs.vercel.app/docs/failure-learning)[Benchmarks](https://headroom-docs.vercel.app/docs/benchmarks)[Configuration](https://headroom-docs.vercel.app/docs/configuration)[Limitations](https://headroom-docs.vercel.app/docs/limitations)Headroom runs **locally**, covers **every** content type, works with every major framework, and is **reversible**.

| Scope | Deploy | Local | Reversible | |
|---|---|---|---|---|
Headroom |
All context — tools, RAG, logs, files, history | Proxy · library · middleware · MCP | Yes | Yes |
|

[lean-ctx](https://github.com/yvgude/lean-ctx)[Compresr](https://compresr.ai),[Token Co.](https://thetokencompany.ai)

Attribution.Headroom ships with the excellent[RTK]binary for shell-output rewriting —`git show --short`

, scoped`ls`

, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use[lean-ctx]as the selected CLI context tool; set`HEADROOM_CONTEXT_TOOL=lean-ctx`

before running`headroom wrap ...`

.

```
git clone https://github.com/chopratejas/headroom.git && cd headroom
uv sync --extra dev && uv run pytest
```

Devcontainers in `.devcontainer/`

(default + `memory-stack`

with Qdrant & Neo4j). See [CONTRIBUTING.md](/chopratejas/headroom/blob/main/CONTRIBUTING.md).

— questions, feedback, war stories.[Discord](https://discord.gg/yRmaUNpsPJ)— the model behind our text compression.[Kompress-v2-base on HuggingFace](https://huggingface.co/chopratejas/kompress-v2-base)

Apache 2.0 — see [LICENSE](/chopratejas/headroom/blob/main/LICENSE).
