See what your AI coding agent actually sends to the API β and what each part costs.
Most usage tools read local logs. That shows the total cost of a call or session, but it misses the request-time context assembled before the model is invoked: system prompts, tool schemas, MCP blocks, tool results, cache reads/writes, and previous thinking blocks.
cost-xray captures the actual local API traffic for Claude Code and Codex, then attributes tokens and dollars back to the sources inside the request. It shows not just how much a turn cost, but why it cost that much.
Home β every session by agent Β· project Β· session, with token and dollar totals | Drill a session β context-window occupancy (top) and per-source cost, down to each MCP server (bottom) |
- A supported coding agent β Claude CodeorCodex - macOS or Linux
- No API keys, no account, no config changes to your agent β capture is a transparent local hop
curl -fsSL https://raw.githubusercontent.com/tigerless-labs/cost-xray/master/install.sh | bash
The installer asks which agent(s) to capture β Claude Code, Codex, or both β and prompts even under curl β¦ | bash
.
Skip the prompt(e.g. CI): setCOST_XRAY_AGENTS=claude|codex|all
.Already cloned the repo? Run./install.sh
.
Then open a new terminal and run claude
/ codex
exactly as before β capture is automatic, no flags and no base-URL change. It's forward-only: runs started in that new shell are captured, not past history. Open the live TUI from anywhere:
cx
Capture runs as a background service (auto-start on boot, self-healing, port-adaptive) and doesn't change what your agent does, its results, or its cost β anytime with cx stop
. See docs/install.md for systemd details, GUI agents (Cursor base-URL setup), manual (no-systemd) mode, and troubleshooting.
cx # open the live cost-xray TUI (from any directory)
cx status # services' state, live ports, sessions captured
cx stop # stop monitoring β proxies down; agents run direct (uncaptured)
cx start # resume monitoring
cx restart # restart the proxies (after a config change)
cx install # (re)install β authoritative: installs the chosen agents, removes the rest
cx uninstall # remove services + shell wrappers (keeps captured data)
cx
is the whole CLI and works from any directory. Change port/upstream by editing ~/.cost-xray/env
, then cx restart
. (./run.sh <cmd>
from the repo still works too β cx
just calls it for you from anywhere.)
In the TUI, drill from the top down: agent β project β session β category β MCP server β tool β per-turn call β the real output. Every cell carries its cache split (read / write / fresh / output $).
| Agent | Status | Capture |
|---|---|---|
CodexThe wire is decoded by a thin per-agent adapter β the only place code forks by agent (docs/architecture.md; per-agent capture + tokenizer notes under docs/providers/). Adding an agent is one small module; anything speaking the Anthropic or OpenAI-Responses wire shape is close to drop-in.
cost-xray traces every token's cost β split into fresh / cache-read / cache-write / output β to the source that caused it, and down to the individual call: the cost of this Read
invocation and its output, not a session sum. Everyone else aggregates β a request- or session-level total, at most grouped by tool type ("Read cost $X this session"). As far as we've found, nothing else prices below the tool, call by call.
What is taking space in the context window right now β system prompt, every tool schema, MCP servers, messages, and generated output β decomposed into source-level rows. Prompt caching makes a stable 40k-token schema block cheap on cache read, but it still crowds out the code and conversation that matter; cost-xray shows you the occupancy, not just the bill.
Configured servers and tools that are injected into every request's prefix but never actually called. They pay their tool-schema overhead on every turn β cost-xray flags the dead weight.
Claude's tokenizer is private β no official or open-source tokenizer exists β and calling Anthropic's count_tokens
API for everything would add load and hit its limits. So for Claude, cost-xray uses an estimator plus proportional calibration: tiktoken sizes each part, the parts it mis-sizes most (thinking, tool schemas) get targeted corrections β pinned with count_tokens
when you're logged in, a fixed ratio otherwise β and the rest is scaled so the calibrated total matches the provider's own usage
. The total, and therefore the bill, is exact; only the split between sources in the same request is approximate. We benchmark those residuals continuously (CONTRIBUTING.md) and they're small enough for attribution work.
Coming soon: an opt-in exact mode β every part sized by full count_tokens
differencing β manually enabled, for users who need maximum per-source precision.
The wrappers are per-command and self-healing: if the proxy is down, the wrapper restarts it and routes through; if it can't, the agent runs direct β never broken. Stop monitoring anytime with cx stop
(agents then run direct).
cost-xray keeps the complete raw API traffic β but a long session re-sends its whole history every turn, so the capture is hugely repetitive. We deduplicate it: each unique block (message, schema, tool result) is stored once, with a tiny per-turn delta. Full per-turn bytes rebuild on demand. Disk stays small even across million-token sessions.
Log-based tools (ccusage, codeburn, and similar) are excellent at local-first session analytics: they read local transcripts, classify turns by tool usage, and price the session by model, day, or task. That answers how much did I spend. cost-xray answers the lower-level question: what bytes did the model actually receive, and which source owns those tokens?
The difference is the data source. Log readers see the transcript after the agent has run β but the system prompt, injected tool schemas, MCP schemas, reminders, and provider-added blocks are assembled at request time and never written to the transcript. In real coding-agent requests, that invisible prefix can be roughly half the context or more. cost-xray reads the raw API request, so it can compute source-level tokens and attribute cost to schemas, MCP servers, tools, and message buckets.
| Question | Log / usage tools | cost-xray wire capture |
|---|---|---|
| How much did the session cost? | Yes | Yes |
| Which task/tool was active? | Yes | Yes |
| System prompt and injected schemas visible? | No | Yes |
| How many tokens does each tool schema occupy? | No, schemas aren't in logs | Yes |
| Which MCP server is dead weight in the prefix? | Estimate | Exact |
| Cache read/write/fresh dollars per source/tool? | No, usage is request-level | Yes, by span and cache boundary |
| Live view of the current request window? | No | Yes |
cost-xray surfaces the data; you read the story. A few patterns worth knowing:
| Signal you see | What it might mean |
|---|---|
| A 40k-token MCP schema block on every turn | A configured server crowding the prefix β drill it to see if any tool was called |
| Cache-read dollars dwarf fresh input | Stable prefix is working; the spend is in what's new each turn |
| Repeated cache-write on the same source | The prefix is being rewritten β something upstream of it changed |
| Thinking tokens dominate output cost | Long reasoning turns; check whether they earned their keep |
| A tool's schema costs more than the tool is ever used | Candidate to drop from the agent's tool-set |
Big tool_result rows on Read /Bash |
|
| Uncapped output bloating the window β cap it at the source |
These are starting points, not verdicts. One experimental session looking odd is fine; the same pattern across weeks of work is a config issue.
cost-xray runs mitmproxy as a local capture hop and keeps all analysis off the request path.
agent ββHTTPβββΆ mitmproxy ββHTTPSβββΆ model API
β
βββ redacted raw request/response β ~/.cost-xray/sessions/
β
βββ materializer β TUI / cost attribution
Capture. Claude Code uses reverse-proxy mode via a base-URL override β no certificate. Codex (locked HTTPS endpoint) uses a forward proxy plus a scoped local CA, trusted only by thecodex
command and never added to your system trust store. The wrapper self-heals: if the proxy is down it restarts and routes through, otherwise it runs the agent direct β capture never breaks your agent.Local & private. The proxy binds to127.0.0.1
and sends no telemetry.Authorization
, API keys, cookies, and secret-looking body fields are redactedbeforeanything hits disk. Everything stays under~/.cost-xray/
β delete that directory to clear it.Off the hot path. The proxy only writes redacted bytes; a separate materializer tokenizes and prices later, so analysis never steals latency from live relay. Raw is the source of truth β every view rebuilds from it on demand.
See docs/install.md for setup and docs/architecture.md for the full pipeline.
Capture built on mitmproxy; the SSE/redaction approach was adapted from llm-interceptor. Pricing data from LiteLLM. README structure inspired by codeburn.
Built by Tigerless Labs.