# OpenAI, Anthropic, Google — Which One Is Quietly Getting More Expensive?

> Source: <https://dev.to/tokonomics/openai-anthropic-google-which-one-is-quietly-getting-more-expensive-3o7a>
> Published: 2026-06-30 00:42:51+00:00

You checked your LLM API pricing last month. Maybe two months ago. You picked a model, budgeted around it, and moved on.

Here's the problem: the price you budgeted for might not be the price you're paying anymore.

Between January and June 2026, OpenAI, Anthropic, and Google made **14 combined pricing changes** across their model lineups. Some prices dropped. Some crept up. A few disappeared entirely when models got deprecated and replaced by pricier successors.

None of them sent you an email about it.

**OpenAI** retired GPT-4 Turbo in Q1 2026. If your code still pointed at `gpt-4-turbo`

, it silently rerouted to GPT-4o. Same name in your logs, different price. GPT-4o is cheaper per token than the old Turbo — but the output token rate shifted from $0.03/M to $0.01/M. Sounds like a win until you realize your prompts were optimized for Turbo's behavior, and GPT-4o generates 30-40% more output tokens on the same prompt. Your per-call cost went up while the per-token price went down.

**Anthropic** launched Claude Sonnet 4 in May 2026 at $3.00/M input. Claude Sonnet 3.5 was $3.00/M too — same price, right? Not quite. Sonnet 4 uses extended thinking by default on complex queries, and thinking tokens bill at the same output rate. A prompt that cost $0.04 on Sonnet 3.5 can cost $0.12 on Sonnet 4 because of the invisible thinking overhead. Three times more — and nothing changed in your code.

**Google** kept Gemini 2.5 Flash at $0.15/M input. Great price. But they added a context length surcharge most teams missed: anything over 128K tokens doubles the rate to $0.30/M. If you're doing RAG with long documents, your actual cost is 2x what the pricing page headline says.

Three things cause the gap:

**Model deprecation roulettes.** When a provider sunsets a model, your API calls don't fail. They silently redirect to the successor. The successor might cost more, generate more tokens, or behave differently enough that your prompts produce longer outputs.

**Hidden token categories.** Thinking tokens, cached tokens, system prompt tokens — these didn't exist two years ago. Now they each have their own rate. Anthropic charges full output rate for thinking tokens. Google gives you 75% off cached tokens but charges 2x for long context. The headline price is just one number in a matrix of five or six.

**Quiet feature changes.** OpenAI's structured output mode, Anthropic's extended thinking, Google's code execution — these features alter how many tokens a response contains. When a provider enables a feature by default on a new model version, your token count changes without you doing anything.

If you froze your code in January 2026 and checked your June bill:

**You're paying more if** you use Claude for complex reasoning (thinking token overhead), send long documents to Gemini (context surcharge), or relied on a deprecated model that got rerouted.

**You're paying less if** you switched to Gemini 2.5 Flash for simple tasks (genuinely cheap at $0.15/M), or you're using DeepSeek V3 which hasn't changed pricing since launch.

**You have no idea if** you're not tracking cost per call. And that's most teams. A 2026 survey by a16z found that 71% of companies using LLM APIs don't track spending at the individual call level. They see one line item on a monthly invoice and hope it looks reasonable.

The problem isn't that providers are being sneaky. They publish every price change. The problem is that nobody is watching — and by the time you check, three months of drift have already hit your budget.

If your AI bill surprised you this month, you're not alone. [Tokonomics](https://tokonomics.ca) tracks every API call by model, feature, and cost — with alerts before the invoice arrives, not after.

*Pricing data current as of June 28, 2026.*
