Cheaper Tokens Drive Higher AI Token Spending

wpnews.pro

The Silicon Data Token Expenditure Index, which tracks aggregate spending on large language model usage, roughly doubled since late 2025 while the price per token fell about 90% since 2023, according to a June 12 presentation by Torsten Slok at Apollo Global Management (Apollo). Reporting reproduced by Edward Conard and others cites the same index. Analysts and commentators frame the pattern as an instance of Jevons paradox: lower unit costs prompt much higher aggregate consumption. Industry coverage also highlights related signs in the inference economy, including larger share of compute devoted to inference and high bills at some corporations, as reported by BusinessEngineer and Technology.org.

What happened

Per a June 12 slide deck by Torsten Slok at Apollo Global Management, the Silicon Data Token Expenditure Index has roughly doubled since late 2025, even as the price of a single token has fallen by about 90% since 2023. Edward Conard's summary and other outlets reproduce the same index finding. Technology.org and other trade coverage report anecdotal examples of runaway bills, including claims that some companies spent unusually large portions of their 2026 AI budgets on inference usage.

Editorial analysis - technical context

Industry reporting frames this as an instance of Jevons paradox, a classical economics observation that improvements in efficiency or lower unit costs can increase total consumption. Business-oriented coverage, including a BusinessEngineer roundup, emphasizes that the market is shifting from episodic, training-dominated spending to continuous, query-driven inference expenditures, and cites Deloitte and market research for the claim that inference now represents a much larger share of AI compute and life-cycle cost.

Context and significance

Editorial analysis: Companies and platforms that monetize per-token or per-query are operating in an environment where lower unit pricing does not automatically reduce vendor revenue or customer bills. Reporting highlights two structural factors: the marginal-cost nature of inference billing, and rapid growth of automated agents and production workflows that scale token consumption. BusinessEngineer cites Deloitte and Fortune Business Insights for metrics on the growing inference market, and Apollo cites Bloomberg/Macrobond data underlying the Token Expenditure Index.

What to watch

Observers should monitor three observable indicators over the next quarters:

•the trajectory of the Silicon Data Token Expenditure Index or comparable industry metrics reported by Bloomberg/Macrobond;
•corporate disclosure of inference spending or anomalous line-item overruns in IT and R&D budgets, as highlighted in trade reporting; and
•vendor pricing changes, bundled offers, or new metering models from major API providers that could alter per-query incentives.

Editorial analysis: For practitioners, the immediate operational implication is a renewed emphasis on cost observability and governance at production scale. Companies building or operating inference-heavy systems will likely need stronger telemetry on token consumption per workflow, configurable throttles for agent fleets, and chargeback models to prevent surprise spend. These are general industry patterns and not claims about any specific firm's internal plans.

Quoted reporting and sourced claims

Per BusinessEngineer, the inference market metrics underpinning this narrative include claims that inference accounts for approximately two-thirds of AI compute in 2026 (Deloitte) and that inference-related markets are expanding rapidly. Technology.org reports anecdotal examples of very large corporate bills and frames the social and political reactions that follow. Apollo's slide deck attributes the Token Expenditure Index construction to Bloomberg and Macrobond.

Limitations

Editorial analysis: Public coverage relies on a constructed index and secondary reporting. The Token Expenditure Index is a proxy for aggregate LLM spending and, as trade reporting notes, is not identical to vendor revenues. Where sources make firm claims about individual corporate spend, those are anecdotal and should be treated as such unless supported by company filings or audited disclosures.

Scoring Rationale #

The story documents a material cost and usage trend in the inference economy with direct operational and financial implications for practitioners. It is notable rather than paradigm-shifting, and recent reporting is timely, so the impact sits in the 'notable' range.

Practice with real Ad Tech data

90 SQL & Python problems · 15 industry datasets

[Active Search Campaigns by BudgetEasy](/problems/sql/active-search-campaigns-by-budget)

[High CPC Clicks & Poor Landing PagesMedium](/problems/sql/high-cpc-clicks-poor-landing-page)

[Campaign ROAS by Attribution ModelHard](/problems/sql/campaign-roas-by-attribution-model)

250 free problems · No credit card

See all Ad Tech problems

source & further reading

letsdatascience.com — original article Encore AI Raises $30 Million Series A Spiking Model Combines Sampling With Attractor Dynamics Google Says AI Helped Fix 1,072 Chrome Bugs