{"slug": "claude-code-costs-act-iv-the-mistakes-catalogue-cheat-sheet", "title": "Claude Code Costs, Act IV — The mistakes catalogue & cheat sheet", "summary": "A developer cataloged 11 costly mistakes in using Anthropic's prompt caching with Claude Code, including routing per turn instead of per conversation, inserting proxies that re-serialize JSON, and leaving tool surfaces unstable. The mistakes range from cache-breaking volatile tokens to trusting displayed session costs over actual billing, with fixes provided for each.", "body_md": "This act consolidates everything into two reference artifacts: a catalogue of the mistakes (each with its symptom, cost, and fix) for revision, and a one-page cheat sheet.\n\nOrdered roughly by how much money they cost and how often they happen.\n\n**1. Routing per turn instead of per conversation.** *Symptom:* a \"cost-saving\" router that bounces models mid-conversation. *Cost:* a cold prefix write on the first switch + a catch-up write on every re-entry; for Sonnet, a net *loss* (+2.1% measured). *Fix:* route sticky — per conversation or per sub-agent. Where you switch matters more than whether.\n\n**2. Inserting a proxy that re-serializes JSON.** *Symptom:* a logging/compression/routing proxy that parses and re-emits the request. *Cost:* even a logically-identical re-encode changes the bytes (separators, escaping, key order, number formatting) and drops cache hit rate to ~0 for *all* traffic. *Fix:* forward original bytes verbatim; mutate only by surgical byte-fragment replacement on the `messages`\n\ntail.\n\n**3. Leaving the tool/MCP surface unstable.** *Symptom:* tool count grows across early turns (async MCP connectors), or dynamic/runtime tool discovery is enabled. *Cost:* byte-0 churn → full cold rebuild (2× write) every affected turn. *Fix:* `--strict-mcp-config`\n\nand a frozen startup tool set; a fixed dispatcher tool instead of runtime tool mutation.\n\n**4. Volatile tokens early in the prompt.** *Symptom:* a timestamp, UUID, or request-ID near the top of the system prompt; unsorted `json.dumps`\n\n. *Cost:* every request has a unique prefix → ~0% hit rate, silently. *Fix:* move all volatile content *after* the last breakpoint; sort keys deterministically.\n\n**5. Stripping or editing thinking blocks in a proxy.** *Symptom:* a proxy that \"cleans up\" or removes thinking blocks. *Cost:* editing → 400 (integrity); removing → 200 but a position-dependent re-key (removing an *early* block lost 8,104 cache tokens vs 440 for a late one). *Fix:* never mutate thinking blocks; carry the request body as Claude Code assembled it.\n\n**6. Trusting the displayed session cost as your invoice.** *Symptom:* using `/cost`\n\nor the status line as the source of truth. *Cost:* it prices 1-hour writes at the 5-minute 1.25× rate and reads ~10% low ($1.77 vs $1.97 on a real session). *Fix:* use the Anthropic Console for absolute billing; use the displayed figure only for in-session relative comparison.\n\n**7. Caching a prefix you'll read fewer than 3 times.** *Symptom:* aggressive caching in a short-lived custom harness. *Cost:* a wasted write is 2× — double what no caching would have cost. *Fix:* cache only when you'll re-read the prefix ≥3 times (the break-even); let interactive sessions cache by default.\n\n**8. Running an aggressive input compressor over a cached prefix.** *Symptom:* LLMLingua/LLM-based compression applied to the conversational prefix. *Cost:* rewriting cached bytes flips reads (0.1×) to writes (2×); accuracy degrades past ~20×. *Fix:* use input compressors only where there's no cache to lose (one-shot, RAG assembly) or only on the fresh tail; prefer a cache-aware tool.\n\n**9. Agentic bursts overflowing the 20-block lookback.** *Symptom:* a single turn emits ~10+ parallel tool calls (≈21+ tool-call/result blocks). *Cost:* the next turn can't re-link the cache and cold-writes the gap. *Fix:* in Claude Code, bound the burst; behind a proxy, a stateless breakpoint grid (stride ~18, 4 markers) rescues up to ~74 blocks/turn.\n\n**10. A cache breakpoint below the minimum prefix.** *Symptom:* marking a sub-1,024-token prefix for caching. *Cost:* the marker silently does nothing (`cache_creation: 0`\n\n); you think caching is on. *Fix:* confirm non-zero `cache_creation`\n\nthen non-zero `cache_read`\n\non the following request.\n\n**11. Assuming HTTP 200 means the cache survived.** *Symptom:* judging a proxy change by \"it still works.\" *Cost:* 200 means \"valid request,\" not \"cache preserved\" — you can silently convert cheap reads into expensive writes. *Fix:* validate every prefix-touching change by watching `cache_read`\n\non the *next* request, not just the status code.\n\n**12. Switching Haiku → Sonnet mid-conversation.** *Symptom:* escalating an in-flight Haiku conversation to Sonnet. *Cost:* stacks three things — full cache miss (model-scoped), Sonnet now keeps-and-bills thinking Haiku had been ignoring, and any prefix-leanness benefit evaporates. *Fix:* decide the tier at conversation start; if you must escalate, accept it as a fresh cold start, not a cheap continuation.\n\n**The four mental models**\n\n`signature`\n\n; you carry it, can't open it, can't edit it.**The numbers**\n\n`total_cost_usd`\n\nunder-reports (prices writes at 1.25×).**The render order** — `tools → system → messages`\n\n. Stable first, volatile last.\n\n**The two safest cost levers** (never touch the prefix):\n\n**The reflex to build:** before any change that touches tools, system, or message *history*, ask *\"does this change a byte inside the cached prefix?\"* If yes, you're paying a 2× rewrite — only do it on purpose.\n\n`/v1/messages`\n\nrequest carrying tools + system + the whole conversation; the server stores nothing and the client re-sends everything (including thinking) each turn. The request is the only state.`signature`\n\ncarries the full reasoning, decrypted server-side; you can't read it, can't edit it (400), but must carry it for continuity. Billed as output at generation, input on replay; `display`\n\nis visibility-only.`keep:\"all\"`\n\neverywhere, so the strip never bites in real usage — and a proxy that `list_changed`\n\nor async connector load) churn byte 0 → full rebuild. Keep the tool set stable; use a fixed dispatcher. Stabilize the tool surface first.*Costs are computed from measured token counts at Anthropic list prices (per 1M, as of 2026-06-15): Opus 4.8 $5/$25, Sonnet 4.6 $3/$15, Haiku 4.5 $1/$5 in/out; cache read 0.1×, cache write 2× (1-hour TTL). The tokenizer factor 0.775 is measured for Haiku and assumed for Sonnet. Sonnet routing figures are modeled, not run. Claude Code's displayed total_cost_usd uses a 1.25× write multiplier and reads lower than these documented-rate figures. Prices and behaviors are version-specific (Claude Code 2.1.150, mid-2026) — re-verify for your own setup.*\n\n**Anthropic documentation**\n\n`platform.claude.com/docs/en/build-with-claude/extended-thinking`\n\n`platform.claude.com/docs/en/build-with-claude/adaptive-thinking`\n\n`platform.claude.com/docs/en/build-with-claude/prompt-caching`\n\n`claude.com/pricing`\n\n**Cost-reduction tooling**\n\n`lm-sys/RouteLLM`\n\n· vLLM Semantic Router `vllm-project/semantic-router`\n\n`chopratejas/headroom`\n\n· LLMLingua `microsoft/LLMLingua`\n\n· RTK `rtk-ai/rtk`\n\n· lean-ctx `yvgude/lean-ctx`\n\n`JuliusBrussee/caveman`\n\n`github/github-mcp-server`\n\n(v1.0.5; dynamic-toolsets removed in v1.1.0, PR #2512 — for tech-debt, not cache) · `docker/mcp-gateway`\n\n(v0.43.0 / `4833d8c`\n\n; `dynamic-tools`\n\ndefault-on)", "url": "https://wpnews.pro/news/claude-code-costs-act-iv-the-mistakes-catalogue-cheat-sheet", "canonical_source": "https://dev.to/sumedhbala/claude-code-costs-act-iv-the-mistakes-catalogue-cheat-sheet-34bo", "published_at": "2026-06-26 06:00:29+00:00", "updated_at": "2026-06-26 06:34:04.773585+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "developer-tools"], "entities": ["Anthropic", "Claude Code", "Sonnet", "LLMLingua", "MCP"], "alternates": {"html": "https://wpnews.pro/news/claude-code-costs-act-iv-the-mistakes-catalogue-cheat-sheet", "markdown": "https://wpnews.pro/news/claude-code-costs-act-iv-the-mistakes-catalogue-cheat-sheet.md", "text": "https://wpnews.pro/news/claude-code-costs-act-iv-the-mistakes-catalogue-cheat-sheet.txt", "jsonld": "https://wpnews.pro/news/claude-code-costs-act-iv-the-mistakes-catalogue-cheat-sheet.jsonld"}}