Headroom

wpnews.pro

  ██╗  ██╗███████╗ █████╗ ██████╗ ██████╗  ██████╗  ██████╗ ███╗   ███╗
  ██║  ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║
  ███████║█████╗  ███████║██║  ██║██████╔╝██║   ██║██║   ██║██╔████╔██║
  ██╔══██║██╔══╝  ██╔══██║██║  ██║██╔══██╗██║   ██║██║   ██║██║╚██╔╝██║
  ██║  ██║███████╗██║  ██║██████╔╝██║  ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║
  ╚═╝  ╚═╝╚══════╝╚═╝  ╚═╝╚═════╝ ╚═╝  ╚═╝ ╚═════╝  ╚═════╝ ╚═╝     ╚═╝
                  The context compression layer for AI agents

60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible

Docs · Install · Proof · Agents · Discord · llms.txt · Enterprise

AI agents / LLMs: read

/llms.txt

here, or fetch the live index / full docs blob. Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

Live: 10,144 → 1,260 tokens — same FATAL found.

Library—compress(messages)

in Python or TypeScript, inline in any appProxy—headroom proxy --port 8787

, zero code changes, any languageAgent wrap—headroom wrap claude|codex|cursor|aider|copilot

in one commandMCP server—headroom_compress

,headroom_retrieve

,headroom_stats

for any MCP clientCross-agent memory— shared store across Claude, Codex, Gemini, auto-dedup— mines failed sessions, writes corrections toheadroom learn

CLAUDE.md

/AGENTS.md

Reversible (CCR)— originals are cached for retrieval on demand

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)

ContentRouter— detects content type, selects the right compressor** SmartCrusher / CodeCompressor / Kompress-base**— compress JSON, AST, or prose** CacheAligner**— stabilizes prefixes so provider KV caches actually hit** CCR**— stores originals locally; LLM callsheadroom_retrieve

if it needs them

→ Architecture · CCR reversible compression · Kompress-v2-base model card

pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes

headroom perf

Granular extras: [proxy]

, [mcp]

, [ml]

, [code]

, [memory]

, [relevance]

, [image]

, [agno]

, [langchain]

, [evals]

, [pytorch-mps]

(Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps

). Requires Python 3.10+.

Savings on real agent workloads:

Workload	Before	After	Savings
Code search (100 results)	17,765	1,408	92%
SRE incident debugging	65,694	5,118	92%
GitHub issue triage	54,174	14,761	73%
Codebase exploration	78,502	41,254	47%

Accuracy preserved on standard benchmarks:

Benchmark	Category	N	Baseline	Headroom	Delta
GSM8K	Math	100	0.870	0.870	±0.000
TruthfulQA	Factual	100	0.530	0.560	+0.030
SQuAD v2	QA	100	—	97%
19% compression
BFCL	Tools	100	—	97%
32% compression

Reproduce: python -m headroom.evals suite --tier 1

· Full benchmarks & methodology

| Agent | headroom wrap | Notes | |---|---|---| | Claude Code | ✅ | --memory · --code-graph | | Codex | ✅ | shares memory with Claude | | Cursor | ✅ | prints config — paste once | | Aider | ✅ | starts proxy + launches | | Copilot CLI | ✅ | starts proxy + launches | | OpenClaw | ✅ | installs as ContextEngine plugin |

Any OpenAI-compatible client works via headroom proxy

. MCP-native: headroom mcp install

.

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as COPILOT_PROVIDER_API_URL=...

during launch.

headroom copilot-auth login

stores a Headroom-specific Copilot OAuth token. This avoids relying on generic GitHub or Copilot CLI tokens that can read Copilot account metadata but may still be rejected by Copilot's token-exchange endpoint.

For GitHub Enterprise Server or custom-domain Copilot deployments, set the deployment domain before launching:

export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com

For GitHub.com Enterprise Cloud URLs such as github.com/enterprises/your-enterprise

, do not set an enterprise-domain override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot API endpoint advertised for the signed-in account.

Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / secret-tool

, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit GITHUB_COPILOT_TOKEN

or GITHUB_COPILOT_GITHUB_TOKEN

rather than relying on host keychain access.

Great fit if you…

run AI coding agents daily and want savings without changing your code
work across multiple agents and want shared memory
need reversible compression — originals are retrievable via CCR within the configured TTL

Skip it if you…

only use a single provider's native compaction and don't need cross-agent memory
work in a sandboxed environment where local processes can't run

Integrations — drop Headroom into any stack

Your setup	Hook in with
Any Python app	`compress(messages, model=…)`
Any TypeScript app	`await compress(messages, { model })`
Anthropic / OpenAI SDK	`withHeadroom(new Anthropic())` · `withHeadroom(new OpenAI())`
Vercel AI SDK	`wrapLanguageModel({ model, middleware: headroomMiddleware() })`
LiteLLM	`litellm.callbacks = [HeadroomCallback()]`
LangChain	`HeadroomChatModel(your_llm)`
Agno	`HeadroomAgnoModel(your_model)`
Strands

app.add_middleware(CompressionMiddleware)

SharedContext().put / .get

headroom mcp install

What's inside

SmartCrusher— universal JSON: arrays of dicts, nested objects, mixed types.** CodeCompressor**— AST-aware for Python, JS, Go, Rust, Java, C++.** Kompress-base**— our HuggingFace model, trained on agentic traces.** Image compression**— 40–90% reduction via trained ML router.** CacheAligner**— stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.** IntelligentContext**— score-based context fitting with learned importance.** CCR**— reversible compression; LLM retrieves originals on demand.** Cross-agent memory**— shared store, agent provenance, auto-dedup.** SharedContext**— compressed context passing across multi-agent workflows.— plugin-based failure mining for Claude, Codex, Gemini.headroom learn

Pipeline internals

Headroom exposes one stable request lifecycle across compress()

, the SDK, and the proxy:

Setup

→ Pre-Start

→ Post-Start

→ Input Received

→ Input Cached

→ Input Routed

→ Input Compressed

→ Input Remembered

→ Pre-Send

→ Post-Send

→ Response Received

Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.Pipeline extensions observe or customize lifecycle stages viaon_pipeline_event(...)

.Compression hooks sit alongside the canonical lifecycle as an additional extension seam.Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under headroom/providers/

so core orchestration stays focused on lifecycle, sequencing, and policy.

CLI/tool slices:headroom/providers/claude

,copilot

,codex

,openclaw

Provider runtime slices:headroom/providers/claude

,gemini

, plus shared backend/runtime dispatch inheadroom/providers/registry.py

Core files stay orchestration-first:wrap.py

,client.py

,cli/proxy.py

, andproxy/server.py

delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.

pip install "headroom-ai[all]"          # Python, everything
npm install headroom-ai                 # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy]

, [mcp]

, [ml]

(Kompress-base), [code]

, [memory]

, [relevance]

, [image]

, [agno]

, [langchain]

, [evals]

, [pytorch-mps]

(Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps

). Requires Python 3.10+.

Using pipx

? Choose a supported interpreter explicitly:

pipx install --python python3.13 "headroom-ai[all]"

→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.

If pip install "headroom-ai[all]"

fails with CERTIFICATE_VERIFY_FAILED

(unable to get local issuer certificate

), your network uses SSL inspection — a MITM proxy presenting a company-issued CA. The build backend (maturin

) downloads rustup

over a connection your TLS stack doesn't trust. Install Rust first so the build doesn't fetch it:

curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && rustup default stable
winget install Rustlang.Rustup && rustup default stable

Restart your shell, then pip install "headroom-ai[all]"

. A prebuilt wheel avoids the Rust build entirely where available: pip install --only-binary headroom-ai headroom-ai

.

Two runtime assets are fetched over TLS; if they are blocked, trust your corporate CA via REQUESTS_CA_BUNDLE

/ SSL_CERT_FILE

/ CURL_CA_BUNDLE

:

— the ONNX Runtime for the Rust core. Alternatively pre-provide it withcdn.pyke.io

ORT_STRATEGY=system

andORT_LIB_LOCATION=/path/to/onnxruntime

.— thehuggingface.co

kompress-base

compression model. Pre-download it and run withHF_HUB_OFFLINE=1

, or setHF_ENDPOINT

to a trusted mirror.

Running with compression disabled (pure gateway) requires neither asset.

headroom learn

— mines failed sessions, writes corrections to CLAUDE.md

/ AGENTS.md

/ GEMINI.md

.

Start here	Go deeper

Architecture Proxy How compression works MCP tools CCR — reversible compression Memory Cache optimization Failure learning Benchmarks Configuration LimitationsHeadroom runs locally, covers every content type, works with every major framework, and is reversible.

Scope	Deploy	Local	Reversible
Headroom
All context — tools, RAG, logs, files, history	Proxy · library · middleware · MCP	Yes	Yes

lean-ctx Compresr,Token Co.

Attribution.Headroom ships with the excellent[RTK]binary for shell-output rewriting —git show --short

, scopedls

, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use[lean-ctx]as the selected CLI context tool; setHEADROOM_CONTEXT_TOOL=lean-ctx

before runningheadroom wrap ...

.

git clone https://github.com/chopratejas/headroom.git && cd headroom
uv sync --extra dev && uv run pytest

Devcontainers in .devcontainer/

(default + memory-stack

with Qdrant & Neo4j). See CONTRIBUTING.md.

— questions, feedback, war stories.Discord— the model behind our text compression.Kompress-v2-base on HuggingFace

Apache 2.0 — see LICENSE.

source & further reading

github.com — original article

Headroom

Run your AI side-project on zahid.host