ccglass is a local reverse proxy that captures LLM API traffic from coding agent CLIs (Claude Code, Codex, DeepSeek, Kimi, etc.) and shows you a real-time dashboard of prompts, costs, and cache hit rates.
It's open source. It's 5,000 lines of Node. It's MIT licensed.
GitHub: https://github.com/jianshuo/ccglass
The hardest part wasn't building a proxy. It was making it work with coding agent CLIs that deliberately bypass HTTP_PROXY.
Every native CLI (Claude Code is Node, Codex is Node, DeepSeek's CLI is Go, etc.) opens HTTPS sockets directly. They don't honor HTTP_PROXY
env vars. So the standard "man-in-the-middle" pattern (mitmproxy, Charles) doesn't apply β these tools need a CA cert to intercept HTTPS, but the CLI isn't going to trust your CA.
The trick: intercept the local loopback hop, not the wire.
The CLI's API base URL is https://api.anthropic.com
. We override it to http://127.0.0.1:8123
. Now the local hop is plain HTTP β no cert, no interception, no TLS. The CLI's Node https
module makes a request to http://127.0.0.1:8123
, which our proxy receives, logs, and forwards to the real https://api.anthropic.com
.
βββββββββββββββ plain HTTP βββββββββββββββ HTTPS βββββββββββββββ
β Claude β βββββββββββββββΆ β ccglass β βββββββββββΆ β Anthropic β
β Code CLI β 127.0.0.1:8123 β proxy β β API β
βββββββββββββββ βββββββββββββββ βββββββββββββββ
β
β log + dashboard
βΌ
βββββββββββββββ
β Browser β
β UI :8123 β
βββββββββββββββ
3 components:
*_BASE_URL
env vars, spawns the CLI as a child processThe trickiest part: LLM APIs use Server-Sent Events (SSE) for streaming. The CLI expects an openai-sse
or anthropic-sse
stream. We need to:
In Node, this is pipeline()
with a Transform
stream that hashes each chunk and writes it to a side channel. The CLI gets the original stream unchanged.
Each provider has a different pricing model. Cache hits, prompt caching, batch API, all change the math.
I extracted pricing into a JSON file (data/pricing.json
) keyed by provider:model
and updated monthly. The cost is computed during the response stream so you see cost accumulating in real time on the dashboard.
The wild feature: ccglass has its own MCP (Model Context Protocol) server. When Claude Code starts, it can call our MCP tools. One of them is get_recent_requests
β Claude can query its own request history from inside the chat.
User: what did I prompt you with 3 turns ago?
Claude: [calls ccglass MCP get_recent_requests]
Claude: You prompted me with "refactor the user service to use the new repository pattern".
It's recursive and weird. I love it.
npm i -g ccglass
ccglass claude
Open the dashboard. Run a few prompts. The first time you see your own cache hit rate, you'll get it.