{"slug": "60-fable-cost-cut-by-converting-code-to-images-and-having-the-model-ocr-it", "title": "60% Fable cost cut by converting code to images and having the model OCR it", "summary": "A new open-source proxy tool called pxpipe reduces Claude Code input token costs by up to 60% by converting dense text content such as system prompts, tool documentation, and history into compact PNG images before sending requests to the API. The tool exploits the fixed token cost of images based on pixel dimensions, achieving roughly 3.1 characters per image token versus 1 character per text token, resulting in a 59–70% reduction in end-to-end billing on Fable workloads.", "body_md": "**Cut Claude Code's input tokens by rendering bulky context as images — the same system prompt, tool docs, and history, in a fraction of the tokens.**\n\nAn image's token cost is fixed by its pixel dimensions, not by how much text is inside it. Dense content (code, JSON, tool output) packs ~3.1 chars per image-token vs ~1 char per text-token on real Claude Code traffic. pxpipe is a local proxy that exploits that gap: it rewrites the bulky parts of your request (system prompt, tool docs, older history) into compact PNGs before the request leaves your machine.\n\nSavings are **workload-dependent** — pxpipe wins on token-dense content and\nleaves sparse/small requests untouched — so these are measured snapshots, not\nconstants. The primary, durable result is **input-token reduction**: dense\nsystem prompts, tool docs, and history go in as compact images instead of text\n(the example above is ≈25k text tokens rendered as ≈2.7k image tokens), every\nrequest measured against its own `count_tokens`\n\ncounterfactual. **Dollars are\ndownstream of that** — at current Fable list prices the token cut lands as a\n**~59–70% lower end-to-end bill** (~72–74% on compressed requests; full pricing\nmath in the FAQ). But list prices can change tomorrow and the token count\nwon't, so tokens — not dollars — are the number to watch. Reproduce both from\n`~/.pxpipe/events.jsonl`\n\n.\n\nThis is what the model sees instead of text:\n\n*~48k characters of system prompt + tool docs (this repo's own README,\nFINDINGS, and source), ≈25k tokens as text, ≈2.7k image tokens as this page.\nProduced by the real transformRequest pipeline: whitespace-minified, reflowed\ninto full rows with ↵ marking original newlines, OCR instruction banner\nco-rendered on top. The model reads renders like this at 100/100 on a clean\neval (see benchmarks).*\n\n**Fable 5 demo (the default, 100/100 reader):**\n\n## Fable-AB-Demo.mp4\n\n- Both demos with both panes on\n**Fable 5**(plain left, pxpipe right).\n\n**Fable reads what Opus can't.** The imaged phrase-count that Opus refuses (see the Opus demo below): the pxpipe arm counts the exact token**10/10** across 39 imaged filler files (matches`grep`\n\nground truth line-for-line) and gets the multi-step ledger arithmetic right (8037 → … → 15,021).**Same answers, ~7× cheaper.** Session totals after both demos: plain**$42.21**, context** 96% full**(964.5k/1M — one task away from forced compaction) vs pxpipe**$6.06** with context to spare (73.5k/1M).**Honest caveat, visible in the clip:** the pxpipe arm answered the count first and needed one follow-up nudge to also print the ledger balance in the requested one-line format; the plain arm followed the format on the first try. Legibility is solved on Fable — single-reply format compliance is the remaining rough edge.\n\n**Opus 4.8 demo (Opus disabled by default):**\n\n## Opus-AB-Demo.mp4\n\n*Side-by-side — plain Claude (left) vs pxpipe (right), both on Opus 4.8 (opt-in; pxpipe is tuned for Fable — see the Fable clip above). Click the image to watch (Google Drive).*\n\n**Demo 1 — fix a failing test suite:** both pass; the dashboard shows pxpipe cut the request to a fraction of the tokens (real, server-measured**context/token reduction**).** Demo 2 — a big file-context (40 files, ~382k tokens) plus a math question and a \"count this phrase\" task:**the math answer (a small** text**needle) reads on both. The phrase-count needs reading the** imaged**filler — so pxpipe-on-Opus can't read it and** honestly surfaces that it won't fabricate a number**(the documented lossy limit: exact values stay text). Plain, meanwhile, bogs down counting file-by-file.\n\n```\nnpx pxpipe-proxy                                  # proxy on 127.0.0.1:47821\nANTHROPIC_BASE_URL=http://localhost:47821 claude  # point Claude Code at it\n```\n\nOpen [http://127.0.0.1:47821/](http://127.0.0.1:47821/) for a live dashboard: tokens saved, per-session\nstats, every text→image conversion side by side, a global kill switch, and\nruntime model chips including GPT 5.6 and GPT 5.5.\n\nNothing else changes. Responses stream normally; pxpipe only compresses the\n*request* (your context going up), never the model's output. Recent turns stay\ntext; the system prompt, tool docs, and older bulk history are imaged.\n\n**It is lossy.** pxpipe is a *gist* tier, not a lossless store. In a\nneedle-in-haystack eval, exact 12-char hex strings inside dense imaged content\ncame back **0/15** on Opus and 13/15 on Fable 5, and the failure mode is\n*silent confabulation*: a plausible wrong value, not an error. Anything you\nneed back byte-exact (IDs, hashes, secrets, exact numbers) must stay text.\nRecent turns do; a dedicated verbatim-risk guard is not built yet.\n\n**Exact-recall escape hatch.** pxpipe only images Fable requests\n(`PXPIPE_MODELS=claude-fable-5`\n\n), so any subagent on a non-Fable model passes\nthrough as text. Route work that needs byte-exact values to one — globally with\n`CLAUDE_CODE_SUBAGENT_MODEL=claude-sonnet-4-6`\n\n, or per-agent with `model: sonnet`\n\nin the agent frontmatter. It reads from source (file/JSONL), not the imaged\nhistory. This covers exact-recall you route on purpose; it does **not** catch a\nsilent misread you did not expect — that is the unbuilt guard above.\n\n**Does it break real work?** Parity in what we measured: a 10-instance\nSWE-bench Lite pilot (the easy subset) resolved **10/10 on both arms**,\npxpipe ON at $27 vs OFF at $54 token-equivalent, and 19 SWE-bench Pro\npairs (harder, long-horizon) resolved **14/19 ON vs 15/19 OFF** at\n**-60% per-request**: verdicts agree on 18/19, and the single split\n(one ON fail) re-resolved 3/3 when replicated, i.e. run-to-run agentic\nvariance, not compression. Small n, details and caveats below.\n\n**Savings are workload-dependent.** It wins on token-dense content\n(~1 char/token: code, JSON, hashes) and *loses money* on sparse English prose\n(~3.5 chars/token). The built-in gate only images content where the math wins,\ncalibrated against N=391 production rows.\n\n**Model scope:** one `PXPIPE_MODELS`\n\nCSV controls which model bases get imaged\nacross both families — default `claude-fable-5,gpt-5.6`\n\n(GPT 5.5 is opt-in;\nit degrades on imaged context). Set\n`PXPIPE_MODELS=off`\n\nto disable imaging entirely, or use\n`~/.config/pxpipe/config.json`\n\nwith `{ \"models\": \"off\" }`\n\n(or a list). For GPT,\npxpipe keeps tool definitions in native JSON (only verbose schema prose moves\ninto the image) so tool-calling stays reliable; unlike the Claude path, the GPT\npath does not add or depend on Anthropic `cache_control`\n\nprompt-cache markers.\nThe dashboard chips can flip any model live without changing client configs.\nOpus 4.7/4.8 was the original Claude scope but misread ~7% of renders\n(`10200`\n\n→`9400`\n\n), so it was turned off by default once Fable 5 hit 100/100 with\nidentical image billing — opt it back in at your own risk via `PXPIPE_MODELS`\n\nor\nthe dashboard chips. Everything else passes through untouched.\n\nMeasured with novel random-number problems the model cannot have memorized:\n\n| test | N | text | pxpipe (image) | tokens |\n|---|---|---|---|---|\nnovel arithmetic, `claude-fable-5` |\n100 | 100% | 100% |\n−38% |\nnovel arithmetic, `claude-opus-4-8` |\n100 | 100% | 93% | −38% |\n| gist recall A/B (decisions, values, paths, names, negations; with distractors; 15k-45k char sessions), Fable 5 | 98/arm | 98/98 | 98/98 |\n- |\n| state tracking (value mutated 3x, final/first/count), Fable 5 | 18/arm | 18/18 | 18/18 |\n- |\n| confabulation on never-stated facts (lower is better), Fable 5 | 16/arm | 0/16 | 0/16 |\n- |\n| verbatim 12-char hex recall, dense render, Opus | 15 | 15/15 | 0/15 |\n- |\n| verbatim 12-char hex recall, dense render, Fable 5 | 15 | - | 13/15 |\n- |\n\n10 SWE-bench Lite instances, Claude Code + Fable 5, paired runs through\npxpipe ON vs OFF, graded with the official `swebench`\n\nDocker harness:\n\n| pxpipe ON | OFF | |\n|---|---|---|\n| resolved | 10/10 |\n10/10 |\n| request size vs own uncompressed body | −65% |\n±0 |\n\nThe −65% is per-request (`count_tokens`\n\nprobe of each body before\ncompression), so it has no turn-count confound. n=10/arm, Lite skews easy.\nRun totals, receipts, caveats: [ eval/swe-bench/](/teamchong/pxpipe/blob/main/eval/swe-bench).\n\n19 completed pairs across two runs (2 dropped: checkout failed both\narms), same setup, official `SWE-bench_Pro-os`\n\nDocker harness:\n\n| pxpipe ON | OFF | |\n|---|---|---|\n| resolved | 14/19 | 15/19 |\n| request size vs own uncompressed body | −60% |\n±0 |\n\nVerdicts agree on 18/19 (three instances failed both arms, one with\nbyte-identical patches across arms). The single split (navidrome, ON\nfail) was replicated 3x on the ON arm: all three runs produced an\nidentical patch and **resolved**, so the original loss was run-to-run\nagentic variance, not compression. Receipts:\n[ eval/swe-bench-pro/](/teamchong/pxpipe/blob/main/eval/swe-bench-pro).\n\nWe also ran GSM8K: 96% imaged. But GSM8K is in training data, so the model\nrecalls memorized answers through its own misreads, inflating the score, so we\nlead with the clean novel-number eval instead. Reproduce:\n\n`eval/gsm8k/`\n\n· `eval/needle-haystack/`\n\n·\n`eval/gist-recall/`\n\n·\nfull analysis in `FINDINGS.md`\n\n.**Is the headline end-to-end, or only on the requests you touched?**\nEnd-to-end, the whole bill. Most compression tools report savings only on\nthe input slice they touched, which flatters the number. The end-to-end\ndenominator is *every* production request: the small ones pxpipe correctly\nleft untouched, all cache writes and reads, and all output tokens (which the\nproxy never compresses). On a 13,709-request snapshot that was 59% ($100 →\n~$41); a later 8,904-compressed-request trace measured ~70%. Compressed-only\nruns higher (~72–74%) and is quoted separately, never as the headline. The\nexact figure is workload-dependent — reproduce it on your own log.\n\n**How is the math measured?**\nBoth sides of the same request, at the same moment. For every `/v1/messages`\n\nPOST the proxy fires a free `count_tokens`\n\nprobe on the original uncompressed\nbody (the counterfactual) in parallel with the real forward, and reads\nAnthropic's actually-billed usage block off the response. Both land in the\nsame row of `~/.pxpipe/events.jsonl`\n\n, so there is no turn-count or\nrun-to-run confound. Dollar conversion uses Fable 5 list ratios: input ×1.0,\ncache write ×1.25, cache read ×0.1, output ×5. Cache pricing is applied\nidentically to both sides, so the caching discount cancels and cannot be\ndouble-counted as \"savings\". Re-derive it yourself from the events log: the\nformula and field names are documented in `src/core/baseline.ts`\n\n.\n\n**What does it actually compress?**\nThree kinds of *input* blocks, each behind a profitability gate:\n\n- large\n`tool_result`\n\nbodies (file reads, command output, logs) above ~6k chars of token-dense content - older collapsed history: turns behind the live tail get re-rendered as image pages, recent turns always stay text\n- the static system prompt + tool docs slab\n\nEverything else passes through byte-identical: your messages, recent turns, the model's output (it is the response, the proxy never touches it), sparse prose, and anything too small to win. Non-Fable models pass through entirely.\n\n**Has it ever failed for real, outside the benchmarks?**\nYes, once in weeks of daily use: the model recalled a person's name from\nimaged chat history and got it confidently wrong. No error, just a\nplausible wrong name. That is the documented failure mode: exact strings\nin imaged content are not byte-safe. Coding sessions tolerate this because\nthe agent re-reads files before editing; pure chat recall has no such check.\n\n```\ntool_result string ──► wrap at 1928px-wide columns ──► pack ~92,000 chars/page ──► PNG[]\n```\n\nThe proxy intercepts `/v1/messages`\n\n, rewrites eligible bulk history into image\nblocks, splices them back cache-friendly (static prefix preserved, so prompt\ncaching keeps working), and forwards. Per-request events log to\n`~/.pxpipe/events.jsonl`\n\n.\n\nThe economics: a 1928×1928 image costs ≈4,761 vision tokens and holds up to\n≈92,000 chars (≈48,000 text tokens at the observed density), so plain text is\ncheaper *only* when it runs denser than ~19 chars/token. Claude Code transcripts\nare far below that (observed 1.91 chars/token, N=391). The runtime estimator (`estimateImageCount`\n\n) plus a chars/token gate\ndecides per-request; sparse prose is left as text.\n\nSame engine, no proxy. Render text → PNGs, or run the full cache-safe transform:\n\n``` js\nimport { renderTextToPngs, transformAnthropicMessages } from \"pxpipe\";\n\nconst imgs = await renderTextToPngs(toolResultText);            // RenderedImage[]\nconst { body, applied, info } = await transformAnthropicMessages({\n  body: requestBytes,\n  model: \"claude-fable-5\",\n});\n```\n\n`options.keepSharp(block)`\n\npins blocks as text (override the heuristic for IDs,\nhashes, paths); `options.emitRecoverable`\n\nreturns the originals of imaged blocks\nso a stateful caller can recover them — the two halves of the fidelity contract\nfor the lossy limitation below. Runtime is pure-JS (Node and edge/Workers);\n`@napi-rs/canvas`\n\nis build-time only. Full API, types, and constants:\n`src/core/index.ts`\n\n.\n\n```\npnpm install && pnpm test     # 376 tests\npnpm run build                # regenerates dist/\n```\n\n**Lossy**: see \"the honest part\" above. Verbatim recall from images is unreliable.- Render latency: encoding PNGs adds time to large requests before they leave (partly offset by the model ingesting fewer tokens). Responses stream normally.\n- ASCII/Latin-1 well tested; CJK works but conservatively.\n- Runtime is pure-JS — runs on Node and edge/Workers.\n`@napi-rs/canvas`\n\nis a build-time-only dev dep (regenerating the glyph atlas), not a runtime dep. - Fable 5 only.\n\nEverything above is measured. Everything here is not. These are hypotheses, not claims; they ship as numbers with an n or they get cut.\n\n**Sharper glyphs.** The 13/15 verbatim gap is partly font legibility, not just the model. A per-char confusion matrix across render styles is paused mid-run (`eval/glyph-matrix/`\n\n); if a zero-cost style lowers read error, the gate compresses harder at the same fidelity.**Effective context.** Dense text carries at ~3x fewer tokens as images. If that holds in the live window and not just the bill, 1M tokens holds ~2x the real content. Open question: can a task needing ~2M raw context run inside Fable's 1M once the bulk is imaged?**Less active text, sharper model.** Long contexts degrade reasoning as they fill. Imaging old bulk shrinks what the model actively reads while keeping it reachable. Hypothesis: same information, smaller active context, better long-task accuracy.\n\nOne bet: longer effective context and a sharper model on long tasks, from the same Fable 5. Numbers or retraction, no hype between.\n\nMIT.", "url": "https://wpnews.pro/news/60-fable-cost-cut-by-converting-code-to-images-and-having-the-model-ocr-it", "canonical_source": "https://github.com/teamchong/pxpipe", "published_at": "2026-07-03 15:50:49+00:00", "updated_at": "2026-07-03 21:34:23.171640+00:00", "lang": "en", "topics": ["ai-tools", "large-language-models", "developer-tools"], "entities": ["pxpipe", "Claude Code", "Fable", "Opus", "Anthropic"], "alternates": {"html": "https://wpnews.pro/news/60-fable-cost-cut-by-converting-code-to-images-and-having-the-model-ocr-it", "markdown": "https://wpnews.pro/news/60-fable-cost-cut-by-converting-code-to-images-and-having-the-model-ocr-it.md", "text": "https://wpnews.pro/news/60-fable-cost-cut-by-converting-code-to-images-and-having-the-model-ocr-it.txt", "jsonld": "https://wpnews.pro/news/60-fable-cost-cut-by-converting-code-to-images-and-having-the-model-ocr-it.jsonld"}}