{"slug": "the-current-ai-pricing-was-always-going-to-go-away", "title": "The current AI pricing was always going to go away", "summary": "**Summary:** The article argues that the era of flat-rate, subsidized AI pricing is ending because inference costs have not fallen as expected, while demand has surged due to new capabilities. Hardware constraints, including rising GPU and memory costs, have forced AI labs to raise prices, as companies like Anthropic spend far more on compute than they earn in revenue. The author predicts a shift toward per-action pricing models, where costs are tied directly to usage, rather than flat-rate subscriptions.", "body_md": "The current AI pricing was always going to go away. It just doesn’t make sense.\nMicrosoft canceled internal Claude Code licenses this week (for whatever reason, even if it’s because they integrated it), Uber blew its entire 2026 AI budget in four months, and GitHub is dropping flat-rate plans across its products.\nYou’ll see the framing “the AI subsidy era is ending” which is a polite way of what everyone’s been doing when they slap AI features into every tier of their product on a bet that inference costs would keep falling.\nThey didn’t and the cost curve is bending the wrong way, and the labs have no choice except to pass that along.\nDid we collectively forget second order thinking?\nEach model generation, costs per token did fall in theory, sometimes 10x less but that was for comparable quality… Lots of people extrapolated and built business models on the extrapolation, which… isn’t how you think about it.\nSecond-order thinking anyone?\nEveryone who deals with road planning knows about is induced demand. Each new capability invents new demand. Highways are the textbook case. Add a lane, you get new commutes. The commutes weren’t there before the lane. AI is the same shape. Cheaper inference doesn’t reduce the bill, it expands what people ask the model to do.\nNow my reasoning queries take >4 minutes, where the old ones took 2m… Agentic workflows make 50 calls where the old workflow made one. Unit cost falls, units explode, but still the total spend goes up.\nAnyone selling a flat-rate “AI assistant” assumed user behavior wouldn’t change. It did. It always does.\nThe second is that the supply side stopped cooperating – memory and GPU economics are moving against you.\nMemory got 4x more expensive. GPUs got >95% more expensive.\nFrontier training and inference run on Nvidia accelerators paired with high-bandwidth memory. The ceiling isn’t transistors anymore, it’s HBM and the advanced packaging that bonds it to the compute die.\nThat ceiling is one factory deep. TSMC’s CoWoS packaging line is the bottleneck for accelerator supply. SK Hynix dominates HBM, with Samsung lagging and Micron behind that. None of them can add capacity overnight. These are 18-to-36 month commitments, minimum, and they were planned for a world that under-forecast demand by an order of magnitude.\nSo GPU pricing is what scarcity pricing looks like. Top-end accelerators today are roughly 2x more expensive than the previous generation at comparable cluster scale. HBM prices have 4x’d in 18 months. Power and cooling are now real constraints in places nobody used to model power for, which is why every hyperscaler now has a “we’re building a gigawatt campus” story and a nuclear-PPA press release.\nAnthropic’s CFO testified under oath this March that the company spent $10 billion on compute and made $5 billion in revenue (Ed Zitron has the math). The labs are underwater on inference. They’re raising prices to keep the lights on.\nCompanies that sold flat-rate AI-everywhere products are now sitting on a margin problem they architected themselves into. The bet was that one of these curves would bend in their favor. None of them did, probably none of them will, certainly not on the timeline their pricing assumed.\nWhat changes from here\nThe product question shifts. It stops being “where can we add AI?” and starts being “which use cases earn the inference cost they burn?” That’s a harder roadmap to write. It also changes the pricing surface, which is the part most product teams haven’t internalized.\nThree architectures handle a moving cost. None of them are new. All of them are uncomfortable for sales teams that grew up selling seats.\nPer-action. Every API call, every generation, every agent step has a price. Revenue scales with cost because they’re indexed to the same underlying event. Twilio has run this since 2008. AWS has run a version of it since 2006. The downside is transparency cuts both ways. Customers see the meter, and they negotiate. The upside is your gross margin doesn’t depend on guessing how hard your power users will hammer the system.\nCredits. Prepaid buckets. Customer buys 100,000 credits, burns them down on whatever, refills. Credits smooth cash flow and let you mix model costs behind a single unit, which is the only sane way to handle a product that routes between five different inference providers. The trap is breakage. Snowflake credits are infrastructure, customers understand what they’re buying. Gift-card credits are stranded assets, and customers can tell which one they bought. You only get to do the second one once.\nHybrid. Base seat with included credits and metered overage. Most enterprise sales motions accept this without flinching, because the seat number still anchors the contract and the meter is the safety valve. It’s the design most AI-native products converge to within their first repricing cycle. Not my favourite, but whatever, it tends to work.\nThe shape isn’t the point by itself, but rather whether the line moves when the cost line moves. Per-seat is the one architecture that pretends costs are fixed.\nEverything else is some flavor of indexing revenue to the underlying event.\nThe impossible choice\nIf your pricing can move with cost, you get to keep building.\nYou can ship the agentic workflow, the heavier reasoning model, the slow expensive feature for power users, and you have a way to be paid for them.\nIf you’re locked into per-seat (or flat, or whatever) – you pick between two losing options. Eat the margin and watch it compress every quarter your customers’ usage grows. Or strip AI out of your cheaper tiers and watch your activation rate fall off the lower-priced cohorts that used to be your funnel.\nBoth options are visible on the next board deck.\nNeither one of them looks fun.", "url": "https://wpnews.pro/news/the-current-ai-pricing-was-always-going-to-go-away", "canonical_source": "https://arnon.dk/the-current-ai-pricing-was-always-going-to-go-away/", "published_at": "2026-05-22 11:24:53+00:00", "updated_at": "2026-05-22 15:40:59.715680+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "enterprise-software"], "entities": ["Microsoft", "Claude Code", "Uber", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/the-current-ai-pricing-was-always-going-to-go-away", "markdown": "https://wpnews.pro/news/the-current-ai-pricing-was-always-going-to-go-away.md", "text": "https://wpnews.pro/news/the-current-ai-pricing-was-always-going-to-go-away.txt", "jsonld": "https://wpnews.pro/news/the-current-ai-pricing-was-always-going-to-go-away.jsonld"}}