Intelligence Per Dollar

Microsoft added average token usage to a model release card yesterday, introducing a new dual benchmark that measures both performance and cost. The metric shows Microsoft's model achieving a 71.6 SWE-Bench Verified score using roughly one-third the tokens of Claude Haiku 4.5, signaling a shift toward evaluating intelligence per dollar. The change reflects growing pressure on AI companies to compete on cost efficiency as enterprises like Uber and Microsoft cap AI spending after budgets were exceeded.

Yesterday Microsoft added a new metric to a model release card, one that will likely become a standard. Average token usage. In the first row, the Microsoft model hits 71.6 on SWE-Bench Verified using about a third of the tokens Claude Haiku 4.5 burns. Benchmarks are now measured on two different dimensions, the overall performance & the cost to achieve that intelligence. This is yet another sign that the era of subsidies 1, tokenmaxxing Even the most valuable companies in the world cannot afford state-of-the-art intelligence for every conceivable use case. 3 Uber capped employee AI spending after blowing through its budget in four months. This new dual benchmark answers the buyer’s only question : what is my intelligence per dollar? Artificial Analysis already benchmarks this. 6 GPT 5.5 & Claude Opus 4.8 land within a point of each other on the Intelligence Index, around 60. Running the index costs $3,357 on GPT 5.5 & $4,685 on Opus 4.8. Same answer, 40% more expensive. Model companies must now compete on both dimensions. The application layer will compete one level up, on dollars per outcome, what a closed ticket, a shipped PR, or a resolved support case actually costs. Every layer in the stack now has to price the same way the customer thinks : per result, not per token. The Unsustainable Subsidy https://tomtunguz.com/ai-model-inflation/ — The era of AI subsidies is ending. ↩︎ https://tomtunguz.com/index.xml fnref:1 Tokenmaxxing https://tomtunguz.com/tokenmaxxing/ — Models that game benchmarks with extra tokens are losing their edge. ↩︎ https://tomtunguz.com/index.xml fnref:2 Microsoft cancels Claude Code licenses, shifting developers to GitHub Copilot CLI https://www.windowscentral.com/microsoft/microsoft-cancels-claude-code-licenses-shifting-developers-to-github-copilot-cli-a-move-likely-driven-by-financial-motives — Microsoft cancelled Claude Code licenses across its Experiences and Devices division Windows, Microsoft 365, Outlook, Teams, Surface after engineering usage outran budgets. ↩︎ https://tomtunguz.com/index.xml fnref:3 Uber caps employee AI spending after blowing through budget in 4 months https://techcrunch.com/2026/06/02/uber-caps-employee-ai-spending-after-blowing-through-budget-in-four-months/ — Uber caps employee AI spending after blowing through budget in four months. ↩︎ https://tomtunguz.com/index.xml fnref:4 Salesforce Spends $300M on AI, Freezes Engineering Hires https://enterprisedna.co/resources/news/salesforce-300m-anthropic-tokens-engineer-hiring-freeze-2026/ — Salesforce Spends $300M on AI, Freezes Engineering Hires. ↩︎ https://tomtunguz.com/index.xml fnref:5 AI Model & API Providers Analysis https://artificialanalysis.ai/ — Independent analysis of AI model costs. ↩︎ https://tomtunguz.com/index.xml fnref:6