Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard.
Average token usage.
In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns.
Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence.
This is yet another sign that the era of subsidies 1, tokenmaxxing
Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case. 3 Uber capped employee AI spending after blowing through its budget in four months.
This new dual benchmark answers the buyer’s only question : what is my intelligence per dollar?
Artificial Analysis already benchmarks this. 6 GPT 5.5 & Claude Opus 4.8 land within a point of each other on the Intelligence Index, around 60. Running the index costs $3,357 on GPT 5.5 & $4,685 on Opus 4.8. Same answer, 40% more expensive.
Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome, what a closed ticket, a shipped PR, or a resolved support case actually costs.
Every layer in the stack now has to price the same way the customer thinks : per result, not per token.
[The Unsustainable Subsidy](https://tomtunguz.com/ai-model-inflation/) — The era of AI subsidies is ending. [↩︎](https://tomtunguz.com/index.xml#fnref:1)
[Tokenmaxxing](https://tomtunguz.com/tokenmaxxing/) — Models that game benchmarks with extra tokens are losing their edge. [↩︎](https://tomtunguz.com/index.xml#fnref:2)
Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI — Microsoft cancelled Claude Code licenses across its Experiences and Devices division (Windows, Microsoft 365, Outlook, Teams, Surface) after engineering usage outran budgets. ↩︎
Uber caps employee AI spending after blowing through budget in 4 months — Uber caps employee AI spending after blowing through budget in four months. ↩︎
Salesforce Spends $300M on AI, Freezes Engineering Hires — Salesforce Spends $300M on AI, Freezes Engineering Hires. ↩︎
AI Model & API Providers Analysis — Independent analysis of AI model costs. ↩︎