The Google I/O 2026 announcement that quietly broke my cost spreadsheet: Gemini's cache-discount tier

At Google I/O 2026, the most impactful announcement for cost-conscious developers was the introduction of a cache-discount pricing tier for Gemini 2.5 Flash, which drastically reduces the cost of input tokens that have been recently sent within a cache time-to-live window. For agent loops where the system prompt and tool list remain unchanged between turns, approximately 95% of input tokens become cached after the first call, leading to a 4.3x cost reduction in a 30-step research agent scenario. This pricing change fundamentally alters the unit economics for building large, long-running agents, prompting developers to update cost calculations and dashboards to track fresh versus cached tokens separately.

Google I/O 2026 had the headline stuff. Gemma 4 in four sizes. The new agent-friendly Gemini surfaces. Genie. Project Mariner stuff. All worth talking about. But the announcement that's actually going to change which agents make sense to build, and which ones I kill on the spreadsheet, isn't any of those. It's the cache-discount pricing tier on Gemini 2.5 Flash. I'm the person who maintains GeminiLens, an open-source observability layer for Gemini agents. Cost calc is a core part of that lib. I had to rewrite the calc the day after the keynote. Here's why. Before I/O 2026, my mental model for a high-frequency Gemini agent looked like this: You can't fix that with a smaller prompt. The tool list IS the agent. Trimming it makes the agent dumber. So the choice was usually: either pay the full input bill, or rebuild the agent around a smaller model that's bad at tools. The cache-discount tier announced at I/O, live in the API now introduces a third price for input tokens: cached input. Tokens you've already sent recently within the cache TTL window cost roughly an order of magnitude less than fresh input tokens. For an agent loop where the system prompt + tool list never changes turn-to-turn, ~95% of "input tokens" on every call after the first one are now cached. The cost graph collapses. I redid my favorite stress test scenario: a 30-step research agent with a 4K-token system prompt and a 12-tool function-calling schema. That's a 4.3x reduction, not from "use a smaller model" or "be smarter about prompts" but from the pricing change alone. I keep a list of "agent ideas I'd love to build but the unit economics kill them." After the cache-discount tier, three of them moved from no to maybe: None of these are flashy. None of them got mentioned in a keynote slide. But for any team building unit-economics-driven Gemini agents, the cache-discount tier is the I/O announcement that changes the model. I had to update three things in GeminiLens the same day: 1. Cost calc now tracks token classes separately. before total cost = input tokens INPUT PRICE + output tokens OUTPUT PRICE after total cost = fresh input tokens INPUT PRICE FRESH + cached input tokens INPUT PRICE CACHED + output tokens OUTPUT PRICE The naive sum was hiding which calls were cache-hot vs cold. Now the JSONL audit log carries fresh input tokens and cached input tokens as separate fields. The Streamlit dashboard renders a per-call cache-hit ratio. 2. Dashboard now shows "could-have-been-cached" warnings. If I see a 30-step run where every call has zero cached tokens, that's a bug — the cache TTL is probably set wrong or the prompt is being reshuffled. Now flagged as a warning. 3. Cost-per-run estimator splits cold-start vs steady-state. For agent loops, the first call is cold full input price , and steady-state is hot. Reporting the average flattens the picture and makes optimization decisions harder. The new dashboard shows both numbers separately. Most I/O coverage I've seen frames the cache tier as "an API optimization" — paragraph 4 of a TechCrunch post, two-thirds down the Gemini docs page. It's actually a unit-economics step change for one specific shape of agent: the kind where the prompt stays large and the loop stays long. If you're shipping that kind of agent, redo your spreadsheet. The math you had on Friday May 16 is no longer the math. This is my entry for the Google I/O 2026 Writing Challenge. I work on open-source AI agent reliability tooling under @MukundaKatta. GeminiLens is on PyPI: pip install geminilens .