# Building ccglass: the architecture of a local LLM reverse proxy

> Source: <https://dev.to/houleixx/building-ccglass-the-architecture-of-a-local-llm-reverse-proxy-1k07>
> Published: 2026-06-17 02:14:58+00:00

ccglass is a local reverse proxy that captures LLM API traffic from coding agent CLIs (Claude Code, Codex, DeepSeek, Kimi, etc.) and shows you a real-time dashboard of prompts, costs, and cache hit rates.

It's open source. It's 5,000 lines of Node. It's MIT licensed.

GitHub: [https://github.com/jianshuo/ccglass](https://github.com/jianshuo/ccglass)

The hardest part wasn't building a proxy. It was making it work with coding agent CLIs that **deliberately bypass HTTP_PROXY**.

Every native CLI (Claude Code is Node, Codex is Node, DeepSeek's CLI is Go, etc.) opens HTTPS sockets directly. They don't honor `HTTP_PROXY`

env vars. So the standard "man-in-the-middle" pattern (mitmproxy, Charles) doesn't apply — these tools need a CA cert to intercept HTTPS, but the CLI isn't going to trust your CA.

The trick: **intercept the local loopback hop, not the wire**.

The CLI's API base URL is `https://api.anthropic.com`

. We override it to `http://127.0.0.1:8123`

. Now the local hop is plain HTTP — no cert, no interception, no TLS. The CLI's Node `https`

module makes a request to `http://127.0.0.1:8123`

, which our proxy receives, logs, and forwards to the real `https://api.anthropic.com`

.

```
┌─────────────┐   plain HTTP    ┌─────────────┐    HTTPS    ┌─────────────┐
│  Claude     │ ──────────────▶ │  ccglass    │ ──────────▶ │ Anthropic   │
│  Code CLI   │  127.0.0.1:8123 │  proxy      │             │ API         │
└─────────────┘                 └─────────────┘             └─────────────┘
                                       │
                                       │ log + dashboard
                                       ▼
                                ┌─────────────┐
                                │  Browser    │
                                │  UI :8123   │
                                └─────────────┘
```

3 components:

`*_BASE_URL`

env vars, spawns the CLI as a child processThe trickiest part: LLM APIs use Server-Sent Events (SSE) for streaming. The CLI expects an `openai-sse`

or `anthropic-sse`

stream. We need to:

In Node, this is `pipeline()`

with a `Transform`

stream that hashes each chunk and writes it to a side channel. The CLI gets the original stream unchanged.

Each provider has a different pricing model. Cache hits, prompt caching, batch API, all change the math.

I extracted pricing into a JSON file (`data/pricing.json`

) keyed by `provider:model`

and updated monthly. The cost is computed *during the response stream* so you see cost accumulating in real time on the dashboard.

The wild feature: ccglass has its own MCP (Model Context Protocol) server. When Claude Code starts, it can call our MCP tools. One of them is `get_recent_requests`

— Claude can query its own request history *from inside the chat*.

```
User: what did I prompt you with 3 turns ago?
Claude: [calls ccglass MCP get_recent_requests]
Claude: You prompted me with "refactor the user service to use the new repository pattern".
```

It's recursive and weird. I love it.

```
npm i -g ccglass
ccglass claude
```

Open the dashboard. Run a few prompts. The first time you see your own cache hit rate, you'll get it.
