{"slug": "building-ccglass-the-architecture-of-a-local-llm-reverse-proxy", "title": "Building ccglass: the architecture of a local LLM reverse proxy", "summary": "A developer built ccglass, an open-source local reverse proxy that captures LLM API traffic from coding agent CLIs and displays a real-time dashboard of prompts, costs, and cache hit rates. The proxy intercepts local loopback traffic by overriding the API base URL to plain HTTP, avoiding TLS interception issues. It also includes an MCP server that allows Claude Code to query its own request history from within the chat.", "body_md": "ccglass is a local reverse proxy that captures LLM API traffic from coding agent CLIs (Claude Code, Codex, DeepSeek, Kimi, etc.) and shows you a real-time dashboard of prompts, costs, and cache hit rates.\n\nIt's open source. It's 5,000 lines of Node. It's MIT licensed.\n\nGitHub: [https://github.com/jianshuo/ccglass](https://github.com/jianshuo/ccglass)\n\nThe hardest part wasn't building a proxy. It was making it work with coding agent CLIs that **deliberately bypass HTTP_PROXY**.\n\nEvery native CLI (Claude Code is Node, Codex is Node, DeepSeek's CLI is Go, etc.) opens HTTPS sockets directly. They don't honor `HTTP_PROXY`\n\nenv vars. So the standard \"man-in-the-middle\" pattern (mitmproxy, Charles) doesn't apply — these tools need a CA cert to intercept HTTPS, but the CLI isn't going to trust your CA.\n\nThe trick: **intercept the local loopback hop, not the wire**.\n\nThe CLI's API base URL is `https://api.anthropic.com`\n\n. We override it to `http://127.0.0.1:8123`\n\n. Now the local hop is plain HTTP — no cert, no interception, no TLS. The CLI's Node `https`\n\nmodule makes a request to `http://127.0.0.1:8123`\n\n, which our proxy receives, logs, and forwards to the real `https://api.anthropic.com`\n\n.\n\n```\n┌─────────────┐   plain HTTP    ┌─────────────┐    HTTPS    ┌─────────────┐\n│  Claude     │ ──────────────▶ │  ccglass    │ ──────────▶ │ Anthropic   │\n│  Code CLI   │  127.0.0.1:8123 │  proxy      │             │ API         │\n└─────────────┘                 └─────────────┘             └─────────────┘\n                                       │\n                                       │ log + dashboard\n                                       ▼\n                                ┌─────────────┐\n                                │  Browser    │\n                                │  UI :8123   │\n                                └─────────────┘\n```\n\n3 components:\n\n`*_BASE_URL`\n\nenv vars, spawns the CLI as a child processThe trickiest part: LLM APIs use Server-Sent Events (SSE) for streaming. The CLI expects an `openai-sse`\n\nor `anthropic-sse`\n\nstream. We need to:\n\nIn Node, this is `pipeline()`\n\nwith a `Transform`\n\nstream that hashes each chunk and writes it to a side channel. The CLI gets the original stream unchanged.\n\nEach provider has a different pricing model. Cache hits, prompt caching, batch API, all change the math.\n\nI extracted pricing into a JSON file (`data/pricing.json`\n\n) keyed by `provider:model`\n\nand updated monthly. The cost is computed *during the response stream* so you see cost accumulating in real time on the dashboard.\n\nThe wild feature: ccglass has its own MCP (Model Context Protocol) server. When Claude Code starts, it can call our MCP tools. One of them is `get_recent_requests`\n\n— Claude can query its own request history *from inside the chat*.\n\n```\nUser: what did I prompt you with 3 turns ago?\nClaude: [calls ccglass MCP get_recent_requests]\nClaude: You prompted me with \"refactor the user service to use the new repository pattern\".\n```\n\nIt's recursive and weird. I love it.\n\n```\nnpm i -g ccglass\nccglass claude\n```\n\nOpen the dashboard. Run a few prompts. The first time you see your own cache hit rate, you'll get it.", "url": "https://wpnews.pro/news/building-ccglass-the-architecture-of-a-local-llm-reverse-proxy", "canonical_source": "https://dev.to/houleixx/building-ccglass-the-architecture-of-a-local-llm-reverse-proxy-1k07", "published_at": "2026-06-17 02:14:58+00:00", "updated_at": "2026-06-17 02:51:26.289321+00:00", "lang": "en", "topics": ["developer-tools", "large-language-models", "ai-agents", "ai-infrastructure"], "entities": ["ccglass", "Claude Code", "Codex", "DeepSeek", "Kimi", "Anthropic", "Node", "MCP"], "alternates": {"html": "https://wpnews.pro/news/building-ccglass-the-architecture-of-a-local-llm-reverse-proxy", "markdown": "https://wpnews.pro/news/building-ccglass-the-architecture-of-a-local-llm-reverse-proxy.md", "text": "https://wpnews.pro/news/building-ccglass-the-architecture-of-a-local-llm-reverse-proxy.txt", "jsonld": "https://wpnews.pro/news/building-ccglass-the-architecture-of-a-local-llm-reverse-proxy.jsonld"}}