MCP Directory
ServersClientsBlog

xASO - App Store Optimization

AI-powered App Store Optimization platform for mobile apps

Go to xASO
MCP Directory

Model Context Protocol Directory

MKSF LTD
Suite 8805 5 Brayford Square
London, E1 0SG

MCP Directory

  • About
  • Blog
  • Documentation
  • Contact

Menu

  • Servers
  • Clients

© 2026 model-context-protocol.com

The Model Context Protocol (MCP) is an open standard for AI model communication.
Powered by Mert KoseogluSoftware Forge
  1. Home
  2. Servers
  3. headroom

headroom

GitHub
Website

Compress tool outputs, logs, files, and RAG chunks before they reach the LLM. 60-95% fewer tokens, same answers. Library, proxy, MCP server.

54,905
1,895
<div align="center"><pre> ██╗ ██╗███████╗ █████╗ ██████╗ ██████╗ ██████╗ ██████╗ ███╗ ███╗ ██║ ██║██╔════╝██╔══██╗██╔══██╗██╔══██╗██╔═══██╗██╔═══██╗████╗ ████║ ███████║█████╗ ███████║██║ ██║██████╔╝██║ ██║██║ ██║██╔████╔██║ ██╔══██║██╔══╝ ██╔══██║██║ ██║██╔══██╗██║ ██║██║ ██║██║╚██╔╝██║ ██║ ██║███████╗██║ ██║██████╔╝██║ ██║╚██████╔╝╚██████╔╝██║ ╚═╝ ██║ ╚═╝ ╚═╝╚══════╝╚═╝ ╚═╝╚═════╝ ╚═╝ ╚═╝ ╚═════╝ ╚═════╝ ╚═╝ ╚═╝ The context compression layer for AI agents </pre></div> <p align="center"><strong>60–95% fewer tokens · library · proxy · MCP · 6 algorithms · local-first · reversible</strong></p> <p align="center"> <a href="https://github.com/chopratejas/headroom/actions/workflows/ci.yml"><img src="https://github.com/chopratejas/headroom/actions/workflows/ci.yml/badge.svg" alt="CI"></a> <a href="https://app.codecov.io/gh/chopratejas/headroom"><img src="https://codecov.io/gh/chopratejas/headroom/graph/badge.svg" alt="codecov"></a> <a href="https://pypi.org/project/headroom-ai/"><img src="https://img.shields.io/pypi/v/headroom-ai.svg" alt="PyPI"></a> <a href="https://www.npmjs.com/package/headroom-ai"><img src="https://img.shields.io/npm/v/headroom-ai.svg" alt="npm"></a> <a href="https://huggingface.co/chopratejas/kompress-v2-base"><img src="https://img.shields.io/badge/model-Kompress--v2--base-yellow.svg" alt="Model: Kompress-v2-base"></a> <a href="LICENSE"><img src="https://img.shields.io/badge/license-Apache%202.0-blue.svg" alt="License: Apache 2.0"></a> <a href="https://headroom-docs.vercel.app/docs"><img src="https://img.shields.io/badge/docs-online-blue.svg" alt="Docs"></a> </p> <p align="center"> <a href="https://headroom-docs.vercel.app/docs">Docs</a> · <a href="#get-started-60-seconds">Install</a> · <a href="#proof">Proof</a> · <a href="#agent-compatibility-matrix">Agents</a> · <a href="https://discord.gg/yRmaUNpsPJ">Discord</a> · <a href="llms.txt">llms.txt</a> · <a href="ENTERPRISE.md">Enterprise</a> </p> <p align="center"><sub> <b>AI agents / LLMs:</b> read <a href="llms.txt"><code>/llms.txt</code></a> here, or fetch <a href="https://headroom-docs.vercel.app/llms.txt">the live index</a> / <a href="https://headroom-docs.vercel.app/llms-full.txt">full docs blob</a>. </sub></p>
<p align="center"><a href="https://trendshift.io/repositories/20881" target="_blank"><img src="https://trendshift.io/api/badge/repositories/20881" alt="chopratejas%2Fheadroom | Trendshift" style="width: 250px; height: 55px;" width="250" height="55"/></a></p>

Headroom compresses everything your AI agent reads — tool outputs, logs, RAG chunks, files, and conversation history — before it reaches the LLM. Same answers, fraction of the tokens.

<p align="center"> <img src="HeadroomDemo-Fast.gif" alt="Headroom in action" width="820"> <br/><sub>Live: 10,144 → 1,260 tokens — same FATAL found.</sub> </p>

What it does

  • Library — compress(messages) in Python or TypeScript, inline in any app
  • Proxy — headroom proxy --port 8787, zero code changes, any language
  • Agent wrap — headroom wrap claude|codex|cursor|aider|copilot in one command
  • MCP server — headroom_compress, headroom_retrieve, headroom_stats for any MCP client
  • Cross-agent memory — shared store across Claude, Codex, Gemini, auto-dedup
  • headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md
  • Reversible (CCR) — originals are cached for retrieval on demand

How it works (30 seconds)

 Your agent / app
   (Claude Code, Cursor, Codex, LangChain, Agno, Strands, your own code…)
        │   prompts · tool outputs · logs · RAG results · files
        ▼
    ┌────────────────────────────────────────────────────┐
    │  Headroom   (runs locally — your data stays here)  │
    │  ────────────────────────────────────────────────  │
    │  CacheAligner  →  ContentRouter  →  CCR            │
    │                    ├─ SmartCrusher   (JSON)        │
    │                    ├─ CodeCompressor (AST)         │
    │                    └─ Kompress-base  (text, HF)    │
    │                                                    │
    │  Cross-agent memory  ·  headroom learn  ·  MCP     │
    └────────────────────────────────────────────────────┘
        │   compressed prompt  +  retrieval tool
        ▼
 LLM provider  (Anthropic · OpenAI · Bedrock · …)
  • ContentRouter — detects content type, selects the right compressor
  • SmartCrusher / CodeCompressor / Kompress-base — compress JSON, AST, or prose
  • CacheAligner — stabilizes prefixes so provider KV caches actually hit
  • CCR — stores originals locally; LLM calls headroom_retrieve if it needs them

→ Architecture · CCR reversible compression · Kompress-v2-base model card

Get started (60 seconds)

# 1 — Install
pip install "headroom-ai[all]"          # Python
npm install headroom-ai                 # Node / TypeScript

# 2 — Pick your mode
headroom wrap claude                    # wrap a coding agent
headroom proxy --port 8787              # drop-in proxy, zero code changes
# or: from headroom import compress      # inline library

# 3 — See the savings
headroom perf

Granular extras: [proxy], [mcp], [ml], [code], [memory], [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.

Proof

Savings on real agent workloads:

WorkloadBeforeAfterSavings
Code search (100 results)17,7651,40892%
SRE incident debugging65,6945,11892%
GitHub issue triage54,17414,76173%
Codebase exploration78,50241,25447%

Accuracy preserved on standard benchmarks:

BenchmarkCategoryNBaselineHeadroomDelta
GSM8KMath1000.8700.870±0.000
TruthfulQAFactual1000.5300.560+0.030
SQuAD v2QA100—97%19% compression
BFCLTools100—97%32% compression

Reproduce: python -m headroom.evals suite --tier 1 · Full benchmarks & methodology

<a href="https://www.star-history.com/?repos=chopratejas%2Fheadroom&type=date&legend=top-left"> <picture> <img alt="Star History Chart" src="https://api.star-history.com/chart?repos=chopratejas/headroom&type=date&legend=top-left" /> </picture> </a>

Agent compatibility matrix

Agentheadroom wrapNotes
Claude Code✅--memory · --code-graph
Codex✅shares memory with Claude
Cursor✅prints config — paste once
Aider✅starts proxy + launches
Copilot CLI✅starts proxy + launches
OpenClaw✅installs as ContextEngine plugin

Any OpenAI-compatible client works via headroom proxy. MCP-native: headroom mcp install.

GitHub Copilot CLI subscription mode

Headroom can route GitHub Copilot CLI subscription traffic through the local proxy:

headroom copilot-auth login
headroom wrap copilot --subscription -- --model gpt-4o

This lets Headroom intercept OpenAI-compatible Copilot CLI requests and apply the same proxy compression pipeline before forwarding to GitHub Copilot's hosted API. The wrapper exchanges Headroom's reusable GitHub OAuth token for Copilot's short-lived API token and prints the upstream endpoint as COPILOT_PROVIDER_API_URL=... during launch.

headroom copilot-auth login stores a Headroom-specific Copilot OAuth token.
This avoids relying on generic GitHub or Copilot CLI tokens that can read
Copilot account metadata but may still be rejected by Copilot's token-exchange
endpoint.

For GitHub Enterprise Server or custom-domain Copilot deployments, set the
deployment domain before launching:

export GITHUB_COPILOT_ENTERPRISE_DOMAIN=ghe.example.com

For GitHub.com Enterprise Cloud URLs such as
github.com/enterprises/your-enterprise, do not set an enterprise-domain
override. Headroom uses GitHub's normal token-exchange endpoint and the Copilot
API endpoint advertised for the signed-in account.

Platform support note: macOS auth reuse via Copilot CLI Keychain storage has been smoke-tested. Windows Credential Manager, Linux Secret Service / secret-tool, and Docker/CI token-injection paths are implemented or planned as auth-discovery paths, but still need real OS validation before they should be considered fully vetted. For Docker and CI, prefer passing an explicit GITHUB_COPILOT_TOKEN or GITHUB_COPILOT_GITHUB_TOKEN rather than relying on host keychain access.

When to use · When to skip

Great fit if you…

  • run AI coding agents daily and want savings without changing your code
  • work across multiple agents and want shared memory
  • need reversible compression — originals are retrievable via CCR within the configured TTL

Skip it if you…

  • only use a single provider's native compaction and don't need cross-agent memory
  • work in a sandboxed environment where local processes can't run
<details> <summary><b>Integrations — drop Headroom into any stack</b></summary>
Your setupHook in with
Any Python appcompress(messages, model=…)
Any TypeScript appawait compress(messages, { model })
Anthropic / OpenAI SDKwithHeadroom(new Anthropic()) · withHeadroom(new OpenAI())
Vercel AI SDKwrapLanguageModel({ model, middleware: headroomMiddleware() })
LiteLLMlitellm.callbacks = [HeadroomCallback()]
LangChainHeadroomChatModel(your_llm)
AgnoHeadroomAgnoModel(your_model)
StrandsStrands guide
ASGI appsapp.add_middleware(CompressionMiddleware)
Multi-agentSharedContext().put / .get
MCP clientsheadroom mcp install
</details> <details> <summary><b>What's inside</b></summary>
  • SmartCrusher — universal JSON: arrays of dicts, nested objects, mixed types.
  • CodeCompressor — AST-aware for Python, JS, Go, Rust, Java, C++.
  • Kompress-base — our HuggingFace model, trained on agentic traces.
  • Image compression — 40–90% reduction via trained ML router.
  • CacheAligner — stabilizes prefixes so Anthropic/OpenAI KV caches actually hit.
  • IntelligentContext — score-based context fitting with learned importance.
  • CCR — reversible compression; LLM retrieves originals on demand.
  • Cross-agent memory — shared store, agent provenance, auto-dedup.
  • SharedContext — compressed context passing across multi-agent workflows.
  • headroom learn — plugin-based failure mining for Claude, Codex, Gemini.
</details> <details> <summary><b>Pipeline internals</b></summary>

Headroom exposes one stable request lifecycle across compress(), the SDK, and the proxy:

Setup → Pre-Start → Post-Start → Input Received → Input Cached → Input Routed → Input Compressed → Input Remembered → Pre-Send → Post-Send → Response Received

  • Transforms do the work: CacheAligner, ContentRouter, SmartCrusher, CodeCompressor, Kompress-base, IntelligentContext / RollingWindow.
  • Pipeline extensions observe or customize lifecycle stages via on_pipeline_event(...).
  • Compression hooks sit alongside the canonical lifecycle as an additional extension seam.
  • Proxy extensions remain the server/app integration seam for ASGI middleware, routes, and startup policy.

Provider and tool-specific behavior lives under headroom/providers/ so core orchestration stays focused on lifecycle, sequencing, and policy.

  • CLI/tool slices: headroom/providers/claude, copilot, codex, openclaw
  • Provider runtime slices: headroom/providers/claude, gemini, plus shared backend/runtime dispatch in headroom/providers/registry.py
  • Core files stay orchestration-first: wrap.py, client.py, cli/proxy.py, and proxy/server.py delegate provider-specific env shaping, API target normalization, backend selection, and transport dispatch.
</details>

Install

pip install "headroom-ai[all]"          # Python, everything
npm install headroom-ai                 # TypeScript / Node
docker pull ghcr.io/chopratejas/headroom:latest

Granular extras: [proxy], [mcp], [ml] (Kompress-base), [code], [memory], [relevance], [image], [agno], [langchain], [evals], [pytorch-mps] (Apple-GPU memory-embedder offload — set HEADROOM_EMBEDDER_RUNTIME=pytorch_mps). Requires Python 3.10+.

Using pipx? Choose a supported interpreter explicitly:

pipx install --python python3.13 "headroom-ai[all]"

→ Installation guide — Docker tags, persistent service, PowerShell, devcontainers.

Corporate / SSL-inspection environments

If pip install "headroom-ai[all]" fails with CERTIFICATE_VERIFY_FAILED
(unable to get local issuer certificate), your network uses SSL inspection — a MITM
proxy presenting a company-issued CA. The build backend (maturin) downloads rustup over a
connection your TLS stack doesn't trust. Install Rust first so the build doesn't fetch it:

# macOS / Linux
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh && rustup default stable
# Windows
winget install Rustlang.Rustup && rustup default stable

Restart your shell, then pip install "headroom-ai[all]". A prebuilt wheel avoids the Rust
build entirely where available: pip install --only-binary headroom-ai headroom-ai.

Two runtime assets are fetched over TLS; if they are blocked, trust your corporate CA via
REQUESTS_CA_BUNDLE / SSL_CERT_FILE / CURL_CA_BUNDLE:

  • cdn.pyke.io — the ONNX Runtime for the Rust core. Alternatively pre-provide it with
    ORT_STRATEGY=system and ORT_LIB_LOCATION=/path/to/onnxruntime.
  • huggingface.co — the kompress-base compression model. Pre-download it and run with
    HF_HUB_OFFLINE=1, or set HF_ENDPOINT to a trusted mirror.

Running with compression disabled (pure gateway) requires neither asset.

headroom learn

<p align="center"> <img src="headroom_learn.gif" alt="headroom learn in action" width="720"> </p>

headroom learn — mines failed sessions, writes corrections to CLAUDE.md / AGENTS.md / GEMINI.md.

Documentation

Start hereGo deeper
QuickstartArchitecture
ProxyHow compression works
MCP toolsCCR — reversible compression
MemoryCache optimization
Failure learningBenchmarks
ConfigurationLimitations

Compared to

Headroom runs locally, covers every content type, works with every major framework, and is reversible.

ScopeDeployLocalReversible
HeadroomAll context — tools, RAG, logs, files, historyProxy · library · middleware · MCPYesYes
RTKCLI command outputsCLI wrapperYesNo
lean-ctxCLI commands, MCP tools, editor rulesCLI wrapper · MCPYesNo
Compresr, Token Co.Text sent to their APIHosted API callNoNo
OpenAI CompactionConversation historyProvider-nativeNoNo

Attribution. Headroom ships with the excellent RTK binary for shell-output rewriting — git show --short, scoped ls, summarized installers. Huge thanks to the RTK team; their tool is a first-class part of our stack, and Headroom compresses everything downstream of it. Headroom can also use lean-ctx as the selected CLI context tool; set HEADROOM_CONTEXT_TOOL=lean-ctx before running headroom wrap ....

Contributing

git clone https://github.com/chopratejas/headroom.git && cd headroom
uv sync --extra dev && uv run pytest

Devcontainers in .devcontainer/ (default + memory-stack with Qdrant & Neo4j). See CONTRIBUTING.md.

Community

  • Discord — questions, feedback, war stories.
  • Kompress-v2-base on HuggingFace — the model behind our text compression.

License

Apache 2.0 — see LICENSE.

Repository

CH
chopratejas

chopratejas/headroom

Created

January 7, 2026

Updated

June 15, 2026

Language

Python

Category

AI