Cache-Aware Spawning: What Changed in llm-cli-gateway, a Week On The llm-cli-gateway v1.6.0 now implements first-class prompt caching across all five supported LLM providers—Claude, Codex, Gemini, Grok, and Mistral Vibe—allowing developers to avoid paying for repeated input tokens on identical system prompts or file dumps. The update introduces a structured `promptParts` shape that separates stable and volatile prompt sections, three new `cache_state://` MCP resources for hit-rate and savings metrics, and a `cache_ttl_expiring_soon` warning for Claude resumes, all while keeping caching opt-in and provider-specific. If your multi-LLM workload sends the same long system prompt or file dump to Claude / Codex / Gemini ten times an hour, you are paying for the same input tokens ten times. Each provider has a cache for exactly this case, and each one expresses the cache differently. This post is about how llm-cli-gateway now uses those caches for you, across all five providers, without you having to re-implement the per-provider cache APIs yourself. I covered the previous round of changes https://dev.to/wernerk au/whats-new-in-llm-cli-gateway-58b8 last week, and I closed that piece with a teaser, that Mistral Vibe was next on the list. A week later, Mistral is in, and a much larger change has landed alongside it, which is what most of this follow-up is about. The new shape of the gateway: it now understands prompt caching as a first-class concern, across all five providers. That is claude , codex , gemini , grok , and mistral Vibe . v1.6.0 shipped today and contains the lot. Short version: every request and request async tool now accepts a structured promptParts shape, the gateway concatenates the parts in a canonical order so the stable bytes precede the volatile tail unchanged across calls, three new cache state:// MCP resources expose hit-rate / hit-count / estimated-savings aggregates back to the orchestrating agent, session get projects a compact cacheState view at read time, and a cache ttl expiring soon warning fires on Claude resumes when the Anthropic cache breakpoint is within 30 seconds of expiry. All of it is opt-in every flag defaults off in 1.x , all of it observes the per-provider cache mechanism rather than fighting it, and none of it adds conversation content to gateway storage. Long version is below, organised the same way I organised last week's post, problem - what changed - what it now does, with the caveats named up front rather than buried. Mistral shipped Vibe https://docs.mistral.ai/mistral-vibe/overview , their open-source CLI coding agent powered by Devstral 2. The gateway now wires mistral request and mistral request async alongside the other four providers. Same shape as the rest, sessions through --resume / --continue which requires session logging enabled = true in ~/.vibe/config.toml , the doctor surfaces this so you do not get an opaque failure , model registry entries, self-update via the vibe binary itself, the same circuit-breaker, approval-gate, flight recorder, metrics, dedup, and durable-job-store plumbing as the others. The model alias resolution is slightly different. Vibe has no --model flag, so the gateway injects the resolved alias via VIBE ACTIVE MODEL instead. That is the only material divergence from the Claude / Codex / Gemini / Grok pattern, and it is documented inline at the call site. Now five providers, five model families, five vendor lineages Anthropic, OpenAI, Google, xAI, Mistral . What I noticed running parallel reviews these past few weeks is that the three OpenAI / Anthropic / Google adjacent triangle agreeing on something is not as informative as it looks, because the three model lineages share a lot of training data and a lot of post-training tendencies. I am not pretending this is statistics, it is just how I use these tools in review work, but adding an xAI voice and a Mistral voice means a five-way agreement is sampled from a meaningfully wider distribution than a three-way agreement, and a one-out-of-five dissent especially from the vendor-outside-the-triangle is a data point I read rather than a vote I discard. The change that took most of the engineering is promptParts . The shape is small: { "promptParts": { "system": "You are a careful reviewer of TypeScript diffs.", "tools": "