The Real Reason AI Costs Keep Rising

AI token costs have dropped 600-fold in six years, yet companies are burning through annual AI budgets in a single quarter due to increased demand and more complex agent architectures. Nvidia CEO Jensen Huang noted that cheaper tokens drive higher total spending, as companies adopt elaborate multi-step AI processes that multiply token usage.

The Real Reason AI Costs Keep Rising Tokens got 600x cheaper in six years, yet companies are blowing a year of AI budget in a single quarter. Why Cheap Tokens Built an Expensive Trap AI runs on tokens , tiny chunks of text you pay for going in and coming out. They cost tenths of a cent each, cheaper than the rounding error on a coffee run. So why is your CFO losing sleep? Companies are burning a year’s AI budget by spring. Nvidia’s CEO https://www.youtube.com/watch?v=gwW8GKwHB3I said he would be alarmed unless his top engineers were each torching a quarter-million in tokens. Here is the paradox: tokens keep getting cheaper https://www.the-ai-corner.com/p/llm-token-cost-optimization-playbook-2026?r=1krivi , fast, and the bills keep climbing anyway. The same task can cost ten times more from one run to the next, and the bill barely explains why. The products that win hide the token machinery and sell you the result instead. Some already do. together with Lovable: Lovable is one of those products. You describe the feature, and the AI ships inside your app , with zero API keys and the token machinery handled for you. 50M projects already run on it, 4 in 5 built by non-technical founders https://lovable.link/fwHNYef : ▫️ Hundreds of connectors for context: CRMs, databases, calendars, comms ▫️ A new Settings tab with per-feature credit usage, so the bill finally tells you something 10% off https://lovable.link/fwHNYef for AI Corner readers: Table of Contents The Bills Exploded Because Tokens Got Cheap The Free-Money Era Just Ended Signal and Noise Wear the Same Uniform Tokens Are the New Headcount The Gold Rush to Measure Tokens The Hidden Winner Sells Outcomes, Not Tokens 1. The Bills Exploded Because Tokens Got Cheap Here is the fact that breaks the popular story. Tokens did not get expensive. They got radically cheaper, and the spending went up anyway. That reads like a contradiction until you remember a 160-year-old observation about coal, which turns out to describe AI better https://www.the-ai-corner.com/p/six-ai-trends-2026 than almost anything written about AI this year. Cheaper Every Year, by an Order of Magnitude The price of running a model has been falling roughly tenfold a year, a slide some investors call “ LLMflation.” A unit of inference that cost around sixty dollars per million tokens when the GPT-3 API launched in 2020 https://www.youtube.com/watch?v=wQP5uPsFCFA now goes for about a dime on the economy tier. That is a 600-fold drop in under six years , faster than Moore’s Law ever moved silicon. Nvidia’s Jensen Huang https://www.the-ai-corner.com/p/jensen-huang-ai-roadmap-10-moves-2026 named the dynamic from the stage. Cost per token keeps falling while total AI spend keeps climbing, because cheaper inference pulls in demand that did not exist before. He started filing tokens under cost of goods sold, the same budget conversation as energy and payroll . When the man selling the shovels says the gold is getting cheaper to mine and the spending is still going vertical, that is worth sitting with. The Architecture Is What Changed Cheap tokens did not just mean more of the same usage . They made an entirely more expensive way of computing viable. When inference is costly, you write a prompt and take an answer. When it is cheap, you build an agent that loops. It retrieves, reasons, calls a tool, checks the result, retries, escalates, and only then responds . Every one of those steps burns tokens, and the loop can run dozens of times for a single request . Researchers call this the structural version of the old paradox . As prices fall, companies reach for more elaborate architectures, and the token multiplier from that complexity swamps the per-unit savings. The budgets broke for a subtle reason. Cheapness made expensive behavior rational, and everyone adopted it in the same quarter. The frontier makes the squeeze worse, with the cost of running top-tier models rising several times a year even as the price of a single token keeps dropping. Two trends pointed in opposite directions, and the buyer feels both at once . 2. The Free-Money Era Just Ended For two years the corporate instruction was simple . Use more AI . Flood every team with tools, reward experiments, worry about the bill later. Later arrived. The same executives who pushed adoption are now installing tiered access, mandatory efficiency reviews, and hard caps on who is allowed to spend. The Receipts Are Public Now Uber drained its annual AI coding budget in roughly four months https://www.linkedin.com/posts/nidhijain24 uber-burned-its-entire-2026-ai-budget-in-activity-7468275973008859136-ArqG/ , and its operating chief said the spending is getting hard to justify against the returns he can actually see. Amazon pulled an internal leaderboard that had turned AI usage into a game among engineers. https://www.linkedin.com/posts/adel-du-toit amazon-just-shut-down-an-internal-ai-leaderboard-activity-7467202393903005696-egvX/ Microsoft canceled a swath of internal coding subscriptions https://www.linkedin.com/pulse/microsoft-cancels-claude-code-licenses-shifts-engineers-john-cloud-lvd6c/ . One consultant described a client that ran up around half a billion dollars of model usage in a single month before anyone noticed. A senior technology executive at a major financial firm put the mood in a single line to the Wall Street Journal, speaking anonymously. The free-money period for AI is over. The squeeze hit hardest at firms that locked multi-year deals before anyone understood their real usage patterns. Nobody says that at the top of an adoption curve. It is what people say once the meter has started to scare them. The Inversion Nobody Priced In The pitch for all this spending was substitution . Inference replaces labor at lower cost, so you trade headcount for compute and pocket the difference. A Nvidia vice president flagged the problem with that math out loud. In some organizations, compute costs have already passed human labor costs. Read that slowly, because it turns the whole thesis upside down . A company that cut staff on the promise that AI would absorb the work, then watched its model bill climb past the salaries it eliminated, sits in the worst possible spot. Fewer people. A bigger bill. No obvious route back to the efficiency that justified the cuts . That is a strategy coming apart in real time, not a line item that needs trimming. 3. Signal and Noise Wear the Same Uniform SaaS trained a generation of operators to read usage as a stand-in for value. More queries, more seats, more logins meant the thing was working. AI severs that reflex at the root, and the reason is mechanical rather than philosophical . The Meter Cannot Tell You What Happened The same workflow on the same input can consume five to ten times more or fewer tokens depending on the prompt, the context retrieved, the model chosen, the tools called, and how often the agent had to retry. The unit on the invoice holds steady . The amount of real work it stands for does not. So a rising bill is, at the same time, the evidence that valuable work got done and the evidence that compute leaked into bad prompts, bloated context, and redundant reasoning. The number alone cannot separate the two. Two companies with identical token bills can be running completely different operations underneath. One is converting inference into outcomes. The other is paying for expensive thrash that looks exactly the same on the line item. To be fair to the optimists, plenty of that spend is real work. Blackstone said model spending across its portfolio companies rose fifteenfold in a single quarter year over year. About 11 percent of the live backend code Uber ships now comes from AI agents handling ride matching, pricing, and bug fixes. The point is not that the spend is wasted. The point is that the invoice cannot tell you which half is which. What the Survivors Did Differently The most-cited evidence here is MIT’s NANDA study https://www.linkedin.com/pulse/genai-divide-why-95-ai-projects-failing-what-nobody-wants-jain-ggsff/ , and its real finding cuts sharper than the headline. Across hundreds of deployments, 95 percent produced no measurable profit impact . The gap was not model quality. It was a learning gap, systems that never adapted to the workflow they were dropped into. The 5 percent that worked shared a discipline. They aimed at back-office operations rather than splashy front-office demos, judged the work by business outcomes instead of benchmark scores, and refused to scale anything that did not already pay for itself. The lesson buried in that failure rate is plain. Value showed up only where someone defined the outcome before turning on the meter. 4. Tokens Are the New Headcount The budget fight inside these companies looks like a finance argument. It is a power struggle wearing a finance costume, and missing that is the fastest way to misread what is happening in boardrooms right now. Headcount Was the Old Power Marker For thirty years, the visible marker of a senior executive was the size of the organization they ran. Directs, skip levels, total headcount. Scope equaled status. When intelligence becomes the scarce resource inside a company, that marker moves. The new measure of seniority becomes how much of that intelligence you direct. This is why the allocation fight runs so hot. Whoever controls the token budget controls the AI equivalent of org scope, and a thirty-year instinct does not surrender quietly. The wars in the phrase are literal. They name a contest over who owns the most valuable resource the company now buys, dressed up as a debate about cloud bills. For an operator, the move is uncomfortable and obvious. The career hedge stopped being the size of the team you protect. It became the ability to point a swarm of agents at a hard problem and come back with a result the business can actually price. The managers who treat the token budget as someone else’s spreadsheet will wake up reporting to the ones who l earned to allocate it . Why the Fight Lands on Outsourcing First When AI spend competes with labor, executives need a baseline, and outsourced work offers the cleanest one https://businessmodelanalyst.com/ai-token-costs-tokenomics-foundation-enterprise-spending/ . A business process contract is already priced in finished units. Cost per ticket, per claim, per invoice, per reviewed contract. That makes it the easiest place to stage the comparison between a human and an agent. Internal labor resists the same scrutiny, because employees do a hundred fuzzy things and nobody volunteers their own team for the chopping block. And the honest numbers, when they finally surface, run humbler than the marketing. On coding tools, the celebrated returns shrink to https://techcrunch.com/2026/06/05/the-token-bill-comes-due-inside-the-industry-scramble-to-manage-ais-runaway-costs/ roughly 1.6 times once you count the full token cost instead of the seat license alone. Still positive . Nowhere near the tenfold story that sold the budget upstairs. 5. The Gold Rush to Measure Tokens A new category is forming at speed, and it is the predictable reaction to a measurement problem. If the meter is noisy, sell people a better meter. Capital is already pouring into the idea that the cure for bad token data is more token data. The Category Is Real and Crowded At the FinOps industry’s flagship 2026 event, the dominant theme got branded the Great Token Panic . Vendors shipped tools that trace every model call down to the user, the agent, and the workflow that triggered it. One platform demonstrated catching a single employee’s surprise 76,000 dollar token spend within minutes of it happening . The established cost management players are racing the same direction, bolting token attribution onto dashboards they already sell. The reframing they are all chasing is the one executives now demand. The question stopped being what did we spend on AI. It turned into what did we get for it. The tell is who shows up to bu y. This is no longer a developer tools story. It is finance, sitting in the same chair it took over cloud spend from a decade ago https://www.thestreet.com/investing/the-next-phase-of-ai-spending-is-already-underway , walking in the moment a cost line gets big enough to need a referee. Why Counting Harder Will Not End the War Here is the limit nobody selling a dashboard wants to say out loud. The value of a token is irreducibly contextual. The same trace is signal in one workflow and waste in another, and no amount of attribution fully resolves that , because the thing being measured, the business value of a unit of inference, stays partly unknowable before the work runs. Better instrumentation catches anomalies and trims obvious waste. Useful, real, worth paying for. But anomaly detection https://cacm.acm.org/blogcacm/tokens-are-a-utility-were-treating-them-like-software/ is a smoke alarm. It tells you something is burning. It cannot tell you whether the fire was worth lighting. It does not make spend legible enough to settle the allocation fight, because perfect attribution fails for the very reason the meter is noisy in the first place. The measurement layer is scaffolding for whatever replaces the token entirely, not the building. 6. The Hidden Winner Sells Outcomes, Not Tokens Watch what the model providers are doing as they walk toward the public markets, because it tells you where this ends. OpenAI is weighing deep cuts to what it charges per token https://www.investing.com/analysis/the-ai-token-pricing-crisis-behind-openai-and-anthropics-revenue-race-200680777 , expecting Anthropic to follow. Both are filing for the largest tech listings in years against revenue run rates that have gone near vertical. Run that through the coal paradox and a price cut stops looking like a gift. Cheaper tokens pull more consumption, and the demand is elastic enough to count, with roughly a 0.29 percent usage drop for every 1 percent the price goes up. Cut the price, grow the volume, and reset every customer’s blown budget right before handing the SEC a filing that needs to show durable demand. The favor to the buyer is the land grab for the seller. The endgame is plainer than the attribution gold rush makes it look. The winner will not be whoever measures token-to-outcome best. It will be whoever stops selling tokens and starts selling the outcome itself. A resolved ticket, a reviewed contract, a processed claim, at a fixed price, eating the token variance on their own books. That move solves the customer’s measurement problem by swallowing it whole, and only the labs control cost-to-serve well enough to price it. The token was never going to survive as the thing you buy. It was the meter on the way to a market that sells finished work, and the companies printing the tokens will be the first to stop counting them.