Building ccglass: the architecture of a local LLM reverse proxy

A developer built ccglass, an open-source local reverse proxy that captures LLM API traffic from coding agent CLIs and displays a real-time dashboard of prompts, costs, and cache hit rates. The proxy intercepts local loopback traffic by overriding the API base URL to plain HTTP, avoiding TLS interception issues. It also includes an MCP server that allows Claude Code to query its own request history from within the chat.

ccglass is a local reverse proxy that captures LLM API traffic from coding agent CLIs Claude Code, Codex, DeepSeek, Kimi, etc. and shows you a real-time dashboard of prompts, costs, and cache hit rates. It's open source. It's 5,000 lines of Node. It's MIT licensed. GitHub: https://github.com/jianshuo/ccglass https://github.com/jianshuo/ccglass The hardest part wasn't building a proxy. It was making it work with coding agent CLIs that deliberately bypass HTTP PROXY . Every native CLI Claude Code is Node, Codex is Node, DeepSeek's CLI is Go, etc. opens HTTPS sockets directly. They don't honor HTTP PROXY env vars. So the standard "man-in-the-middle" pattern mitmproxy, Charles doesn't apply — these tools need a CA cert to intercept HTTPS, but the CLI isn't going to trust your CA. The trick: intercept the local loopback hop, not the wire . The CLI's API base URL is https://api.anthropic.com . We override it to http://127.0.0.1:8123 . Now the local hop is plain HTTP — no cert, no interception, no TLS. The CLI's Node https module makes a request to http://127.0.0.1:8123 , which our proxy receives, logs, and forwards to the real https://api.anthropic.com . ┌─────────────┐ plain HTTP ┌─────────────┐ HTTPS ┌─────────────┐ │ Claude │ ──────────────▶ │ ccglass │ ──────────▶ │ Anthropic │ │ Code CLI │ 127.0.0.1:8123 │ proxy │ │ API │ └─────────────┘ └─────────────┘ └─────────────┘ │ │ log + dashboard ▼ ┌─────────────┐ │ Browser │ │ UI :8123 │ └─────────────┘ 3 components: BASE URL env vars, spawns the CLI as a child processThe trickiest part: LLM APIs use Server-Sent Events SSE for streaming. The CLI expects an openai-sse or anthropic-sse stream. We need to: In Node, this is pipeline with a Transform stream that hashes each chunk and writes it to a side channel. The CLI gets the original stream unchanged. Each provider has a different pricing model. Cache hits, prompt caching, batch API, all change the math. I extracted pricing into a JSON file data/pricing.json keyed by provider:model and updated monthly. The cost is computed during the response stream so you see cost accumulating in real time on the dashboard. The wild feature: ccglass has its own MCP Model Context Protocol server. When Claude Code starts, it can call our MCP tools. One of them is get recent requests — Claude can query its own request history from inside the chat . User: what did I prompt you with 3 turns ago? Claude: calls ccglass MCP get recent requests Claude: You prompted me with "refactor the user service to use the new repository pattern". It's recursive and weird. I love it. npm i -g ccglass ccglass claude Open the dashboard. Run a few prompts. The first time you see your own cache hit rate, you'll get it.