{"slug": "why-your-anthropic-prompt-caching-probably-isn-t-working-and-the-npm-package-i", "title": "Why your Anthropic prompt caching probably isn't working (and the npm package I built to fix it)", "summary": "Anthropic's prompt caching often fails silently due to misplaced breakpoints, prefix drift between calls, short TTL expiration (reduced from 1 hour to 5 minutes), and lack of measurement. The author, a solo developer, created an npm package called `prompt-cache-optimizer` that wraps the Anthropic SDK to track cache hit rates, log dollars saved, and warn when performance drops below a configurable threshold. The package provides a `cacheInfo` field on every response and aggregate stats to help developers verify caching is actually working.", "body_md": "I'm a solo developer with about five years of experience, mostly outside AI. The last few months I've been getting serious about it — reading docs, building small things with Claude, learning how it differs from the web APIs I'm used to.\n\nWhile I was setting up Anthropic prompt caching for a project, I got stuck on a question I couldn't easily answer: how do I know it's actually working? The docs explained the `cache_control`\n\nAPI and the 90% discount on cached tokens. But the only way to verify a call had hit the cache was to manually parse `cache_read_input_tokens`\n\nfrom the response usage on every request. Nobody seems to do this.\n\nThat gap turned into my first published npm package, `prompt-cache-optimizer`\n\n. This post is what I learned about the four ways prompt caching silently fails, and what the package does to catch them.\n\n## What prompt caching is supposed to do\n\nWhen you call `messages.create`\n\nwith a long, stable prefix (system prompt, tool definitions, retrieved documents), Anthropic lets you mark a `cache_control`\n\nbreakpoint. On the first call, that prefix gets written to the cache at ~1.25x the normal input rate. On any subsequent call within the cache TTL, the cached tokens are read back at **10% of the input rate**.\n\nThat's a 90% discount on whatever portion of your prompt is stable. For a chatbot that re-sends a 10K-token system prompt every turn, this is the difference between a $5K monthly bill and a $500 one.\n\nThe math is incredible. The execution is finicky.\n\n## The four ways prompt caching silently fails\n\n**Misplaced breakpoints**\n\n`cache_control`\n\nmarkers cache everything before them in the request. Put the breakpoint in the wrong place and you cache the wrong things. Worse, the call still succeeds — Anthropic happily processes it, you get a normal response, you just paid full price.**Prefix drift across calls**\n\nThe cache only hits if the cacheable prefix is byte-identical to what was cached. If you reorder your tools array between calls, or shuffle retrieved documents, or insert a timestamp anywhere in your system prompt — the prefix is different, cache misses, you pay full price.\n\nWorse, you also pay the 1.25x write cost to cache the new (now-different) prefix, which expires in 5 minutes if nothing else hits it. So you're paying more than you would without caching at all.\n\n**TTL expiration**\n\nAnthropic recently dropped the default cache TTL from 1 hour to 5 minutes. A lot of setups that \"had caching working\" started silently regressing — calls that came in 6 minutes apart instead of 4 minutes started missing the cache. Nobody got an error. The bill just went up.**No measurement**\n\nThe only way to verify any of the above is to parse`cache_read_input_tokens`\n\nand`cache_creation_input_tokens`\n\nfrom every single response, compute a hit rate, and compare against an expected baseline. Nobody does this. Most teams \"set up caching\" once, watch the first response come back with high cached tokens, and assume it works forever.\n\n**The wrapper I built**\n\nI shipped a small TypeScript package called __prompt-cache-optimizer__ that fixes the measurement problem and warns about the other three.\n\nIt's a drop-in wrapper for `@anthropic-ai/sdk`\n\n. Use it exactly like the SDK:\n\n``` js\nimport { CachedAnthropic, placeBreakpoints } from \"prompt-cache-optimizer\";\n\nconst client = new CachedAnthropic({\n  apiKey: process.env.ANTHROPIC_API_KEY!,\n  warnIfHitRateBelow: 0.6,\n});\n\nconst { system, messages } = placeBreakpoints({\n  system: longSystemPrompt,\n  messages: conversation,\n  strategy: \"after-system\",\n});\n\nconst response = await client.messages.create({\n  model: \"claude-sonnet-4-6\",\n  max_tokens: 1024,\n  system,\n  messages,\n});\n\nconsole.log(response.cacheInfo);\n// {\n//   hit: true,\n//   cachedTokens: 8420,\n//   uncachedTokens: 312,\n//   cacheWriteTokens: 0,\n//   dollarsSaved: 0.024,\n//   dollarsSpent: 0.001\n// }\n```\n\nEvery response gets a `cacheInfo`\n\nfield with the parsed numbers. The client also tracks aggregate stats:\n\n```\nconsole.log(client.stats());\n// {\n//   totalCalls: 142,\n//   cacheHits: 124,\n//   hitRate: 0.873,\n//   totalCachedTokens: 1_240_000,\n//   dollarsSaved: 3.72,\n//   dollarsSpent: 1.41,\n// }\n```\n\nAnd when something looks wrong, it emits passive warnings instead of throwing:\n\n-\n`cache-write-without-read`\n\n→ your cacheable prefix changed call-over-call (the silent failure mode) -\n`low-hit-rate`\n\n→ rolling cache hit rate dropped below your threshold -\n`no-cache-control-found`\n\n→ you forgot to mark anything cacheable -\n`unknown-model`\n\n→ pricing unknown, dollar accounting skipped\n\nRoute them anywhere you like:\n\n``` js\nnew CachedAnthropic({\n  apiKey,\n  onWarning: (event) => logger.warn(event),\n});\n```\n\n**Real numbers**\n\nThe included example processes 5 questions reusing a large system prompt. Here's the actual output:\n\nFive calls. The first writes to cache (cost: a tiny bit more than uncached). Calls 2-5 each hit the cache.\n\n-\n**80% hit rate**(4 hits, 1 miss — the first call always misses since that's when the cache gets written) -\n**$0.017** saved on**$0.020** spent - Same workload without caching would have cost\n**$0.037**— a** 46% reduction**\n\nAt higher call volumes the proportions get even better. A chatbot answering 1000 questions/day with a 10K-token system prompt easily hits 70%+ cost reductions.\n\n**How big the install is**\n\nThe package is ~50KB unpacked, has **zero runtime dependencies**, and treats `@anthropic-ai/sdk`\n\nas a peer dependency. It does not phone home, store payloads, or require an account.\n\n**Roadmap**\n\n**v0.1** is intentionally focused on measurement and explicit helpers. Coming up:\n\n-\n**v0.2**— auto-placement of`cache_control`\n\nbreakpoints based on observed prompt stability (no more manual`placeBreakpoints()`\n\n) -\n**v0.3**— safe message/tool reordering to maximize the stable prefix -\n**v0.4**— OpenAI and Gemini prompt caching support -\n**v1.0**— persistent stats adapter, middleware mode\n\n**Try it**\n\n```\nnpm install prompt-cache-optimizer @anthropic-ai/sdk\n```\n\n-\n**npm:**[https://www.npmjs.com/package/prompt-cache-optimizer](https://www.npmjs.com/package/prompt-cache-optimizer) -\n**GitHub:**[https://github.com/leonhail-nell/prompt-cache-optimizer](https://github.com/leonhail-nell/prompt-cache-optimizer)\n\nIf you find it useful, a GitHub star is the single biggest signal that helps other developers find it. If it saves you real money on your Anthropic bill, I'd love to hear about it — file an issue or DM me.", "url": "https://wpnews.pro/news/why-your-anthropic-prompt-caching-probably-isn-t-working-and-the-npm-package-i", "canonical_source": "https://dev.to/leonhail/why-your-anthropic-prompt-caching-probably-isnt-working-and-the-npm-package-i-built-to-fix-it-42c", "published_at": "2026-05-20 04:48:59+00:00", "updated_at": "2026-05-20 05:01:27.312429+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "developer-tools", "open-source", "products"], "entities": ["Anthropic", "Claude", "prompt-cache-optimizer"], "alternates": {"html": "https://wpnews.pro/news/why-your-anthropic-prompt-caching-probably-isn-t-working-and-the-npm-package-i", "markdown": "https://wpnews.pro/news/why-your-anthropic-prompt-caching-probably-isn-t-working-and-the-npm-package-i.md", "text": "https://wpnews.pro/news/why-your-anthropic-prompt-caching-probably-isn-t-working-and-the-npm-package-i.txt", "jsonld": "https://wpnews.pro/news/why-your-anthropic-prompt-caching-probably-isn-t-working-and-the-npm-package-i.jsonld"}}