{"slug": "prompt-caching-vs-the-long-llm-conversation-where-your-input-bill-actually-hides", "title": "Prompt caching vs the long LLM conversation: where your input bill actually hides", "summary": "A developer built PromptCrunch, a drop-in proxy that reduces input token costs in long multi-turn LLM conversations by deduplicating code, compacting stale tool output, and summarizing old turns. In tests with Claude Code, input tokens dropped about 75% when caching was not covering the request and 7-10% when it was. The tool is designed for any long multi-turn application, including agents and chatbots.", "body_md": "I kept watching my Claude Code bill climb through long sessions, and most of it was not new work. It was the same conversation getting re-sent every turn. A multi-turn call is stateless, so your client ships the whole history each time: file reads, tool output, old diffs, all of it, and you pay input tokens on that pile again and again.\n\nSo I built PromptCrunch. It is a drop-in proxy that optimizes the conversation before it reaches the model. Under the hood: it deduplicates superseded code, compacts stale tool output, summarizes old turns and reuses those summaries across requests, and preserves your recent turns and structured data verbatim, rewriting a request only when it nets out cheaper to send. Setup is two lines: swap your base_url, add one header. Your provider key still goes straight to your provider.\n\nIt is not just a Claude Code thing. The same re-sent history piles up in any long multi-turn app: agents, customer chatbots, conversational products. Claude Code is just where I run into it most.\n\nCaching discounts the repeated prefix, and inside a hot window it does most of the work. But it expires after about 5 minutes and only covers the prefix. Real sessions are bursty. You work, you read, you step away, you come back. The cache goes cold in the gaps, and as the session grows the history runs past what caching holds.\n\nThat cold-cache window is where PromptCrunch pays for itself. On my own Claude Code runs, input tokens dropped about 75% when caching was not covering the request, and 7 to 10% when it was. The two stack: caching takes the hot window, PromptCrunch takes the long session.\n\nOn a long session you stop re-paying for history the model already processed, so the bill starts tracking the work instead of the turn count. You only pay on requests that came out smaller, so trying it costs you nothing. Your keys are never stored, and a zero-retention mode means we hold nothing at all.\n\nIt earns the most on long, multi-turn work, and not much on short prompts or one-shot calls, so point it where the sessions run long.\n\nPoint it at one real session and watch the per-request savings on your dashboard. You start with $5 of free credit, no card needed. [Try it here](https://promptcrunch.dev/signup?utm_source=devto&utm_medium=post&utm_campaign=cc-caching).", "url": "https://wpnews.pro/news/prompt-caching-vs-the-long-llm-conversation-where-your-input-bill-actually-hides", "canonical_source": "https://dev.to/promptcrunch/prompt-caching-vs-the-long-llm-conversation-where-your-input-bill-actually-hides-2i72", "published_at": "2026-06-15 23:48:59+00:00", "updated_at": "2026-06-16 00:17:09.939495+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-infrastructure"], "entities": ["PromptCrunch", "Claude Code", "Anthropic"], "alternates": {"html": "https://wpnews.pro/news/prompt-caching-vs-the-long-llm-conversation-where-your-input-bill-actually-hides", "markdown": "https://wpnews.pro/news/prompt-caching-vs-the-long-llm-conversation-where-your-input-bill-actually-hides.md", "text": "https://wpnews.pro/news/prompt-caching-vs-the-long-llm-conversation-where-your-input-bill-actually-hides.txt", "jsonld": "https://wpnews.pro/news/prompt-caching-vs-the-long-llm-conversation-where-your-input-bill-actually-hides.jsonld"}}