Beyond automation: How much does AI really cost?

An anonymous enterprise spent $500 million in a single month on Claude AI due to a lack of usage limits, while Uber exhausted its 2026 AI budget before mid-year. JPMorgan reported AI token costs are eroding internet profits, and companies like Shopify, Spotify, ServiceNow, and Roku cited AI as a major operational expense. The core issue is a cost modeling problem, not a technology failure, requiring organizations to model token volume per workflow type before deployment.

An anonymous enterprise recently spent $500 million in a single month on Claude AI https://www.axios.com/2026/05/28/ai-spending-roi-enterprise-costs — not because the technology failed, but because nobody set usage limits before rolling it out to employees. Uber exhausted its entire AI budget for 2026 before the first half of the year ended https://fortune.com/2026/05/26/uber-coo-ai-spending-tokens-claude-code/ . JPMorgan published a report titled “ AI Token Costs Are Eating into Internet Profits https://eu.36kr.com/en/p/3833996464072580 .” Shopify, Spotify, ServiceNow and Roku all cited AI as a major source of operational expense pressure in recent earnings calls https://www.techflowpost.com/en-US/article/31853 . This is not a technology problem. It is a cost modelling problem. Most organizations ask the right first questions: What work should be AI-enabled? Which deployment approach fits each domain? But there is a third question that is almost never asked before launch: How much will it cost to operate this at scale? The answer requires understanding three parameters simultaneously — and the interaction between them is deeply counterintuitive. The deployments that did not produce budget surprises shared one characteristic: token volume was modelled per workflow type before the architecture was finalized. AI operational cost is not simply a function of how complex or sophisticated the task is. It is the product of three variables: Total AI Cost = Tokens activity × Frequency repetitions × N users Tokens activity measures the cognitive depth of a single session — how much input and output the AI processes to complete one instance of the task. Frequency repetitions measures how often that activity is executed — daily, weekly, per transaction, per customer interaction. N users measures how many individuals or automated processes are executing that activity across the organization. The critical insight is that these three parameters behave in opposite directions depending on where the work sits in the T–R–M framework — and that inversion is what produces the budget surprises. In a previous article in this series, we introduced the T–R–M framework as a structured way to analyze how work is internally composed across three dimensions: Task nature T , Relational density R , and Human–AI operational mode M . The M dimension — human–AI operational mode — describes how work is distributed between humans and AI, ranging from full automation M0 to human-dominant work where AI has no viable operational role M4 . Most professional roles operate across multiple M modes simultaneously within the same week. What the framework did not yet address is the economic consequence of that distribution at scale. That is what this article adds. Each M mode has a characteristic token consumption profile per session. These ranges reflect the cognitive depth of the interaction — but they tell only one third of the story. Mode | Label | Tokens / session | Freq. / user / month | Cost driver | | M0 | Fully Autonomous AI | 1,000 – 8,000 | Hundreds–Thousands | N users × frequency | | M1 | Supervised AI | 8,000 – 30,000 | Tens–Hundreds | Volume at scale | | M2 | Hybrid Chain | 20,000 – 60,000 | 10–50 | Collaboration depth | | M3 | Extended Cognition | 50,000 – 120,000+ | 2–10 | Session intensity | | M4 | Human-Dominant | Minimal / zero | 1–5 | Negligible | Table 1. Estimated token consumption per session by Human–AI Operational Mode, with scale and cost driver characteristics. The apparent paradox is immediate: M3 Extended Cognition consumes the most tokens per session, yet Goldman Sachs estimates that agentic AI — operating primarily in M0 and M1 — may increase total token demand by 24 times current levels https://www.goldmansachs.com/insights/articles/ai-agents-forecast-to-boost-tech-cash-flow-as-usage-soars . The reason is the multiplier effect of frequency and users. An M0 task consuming 5,000 tokens per execution, running 500 times per day across 1,000 users, generates 2.5 billion tokens per month. An M3 session consuming 80,000 tokens, executed 4 times per month by 15 senior professionals, generates 4.8 million tokens. The ratio is roughly 500 to 1 — in favour of the task that costs less per session. In the previous article, we followed a Business Relationship Manager through a single Tuesday. By Friday she had produced one prioritized backlog, two stakeholder briefings, three escalation memos, a renegotiated SLA, and a verbal commitment that quietly reshaped Q3 priorities for forty engineers. Decomposed through T–R–M, that single week operated simultaneously across M0, M1, M2, M3, and M4. Applying the three-parameter cost model to each layer reveals a profile that is almost the inverse of what most organizations assume when they deploy AI for this role. Activity | M Mode | Tokens / session | Sessions / month | Users org | Monthly cost index | | Consolidating intake tickets | M0 | 3,000 – 8,000 | ~200 | 500+ | 🔴 Very high | | Drafting status briefings | M1 | 10,000 – 25,000 | 40 | 200 | 🟠 High | | Translating needs → requirements | M2 | 25,000 – 50,000 | 20 | 50 | 🟡 Medium | | Alignment in steering meetings | M3 | 50,000 – 100,000 | 8 | 10 | 🟡 Medium | | SLA renegotiation post-incident | M4 | Minimal | 2 | 5 | 🟢 Low | | Hallway verbal commitments | M4 | Zero | — | 1 | 🟢 Negligible | Table 2. Token economics model for the Business Relationship Manager profile. ‘Monthly cost index’ is qualitative — relative budget exposure across activity layers. The insight is not that M0 is too expensive to deploy — it is often the layer with the clearest ROI. The insight is that organizations routinely model the cost of M0 as if it were one user running one query. The actual cost is the product of all three parameters. For a BRM function deployed across a 500-person organization, the ticket consolidation layer alone can represent most of the total AI budget for that role. Meanwhile, the steering meeting preparation — the M3 layer where the BRM synthesizes competing stakeholder positions, interprets political dynamics, and formulates negotiation strategy — consumes high tokens per session but runs infrequently and serves a small number of senior professionals. Its contribution to total cost is comparatively modest. Organizations consistently overestimate the cost of the work AI does best and underestimate the cost of the work it does most. A senior consultant in a professional services firm operates across a different but structurally comparable T–R–M profile. The mix shifts toward M2 and M3 — more cognitive depth per session, lower frequency, smaller user population — but the same three-parameter logic applies. Activity | M Mode | Tokens / session | Sessions / month | Users firm | Monthly cost index | | Translation short docs | M1 | 8,000 – 20,000 | 15–20 | 300 | 🟠 High | | Document analysis | M1–M2 | 15,000 – 40,000 | 8–10 | 200 | 🟡 Medium | | Deliverable creation | M2 | 20,000 – 60,000 | 4–6 | 100 | 🟡 Medium | | RFP analysis + Excel sim. | M2 | 25,000 – 70,000 | 2–4 | 50 | 🟡 Medium | | Code / automation | M2–M3 | 25,000 – 80,000 | 3–5 | 80 | 🟡 Medium | | Framework development | M3 | 50,000 – 120,000+ | 2–4 | 10–20 | 🟢 Low at scale | | Strategic negotiation | M4 | Minimal | 1–3 | 5 | 🟢 Negligible | Table 3. Token economics model for the Senior Consultant profile. Framework development sessions M3 are the most token-intensive per session but the least significant at organizational scale. Two observations stand out. First, translation — often dismissed as a low-cost commodity task — becomes a significant budget line when deployed at scale across a multilingual firm. A translation layer running 15–20 sessions per month per consultant, across 300 consultants, is not a negligible cost. It is a manageable one, but it must be modelled explicitly. Second, framework development and strategic reasoning — the M3 activities that generate the highest per-session token consumption — are also the activities with the smallest user population and lowest frequency. Firm-wide, they may represent a smaller budget line than routine document analysis, even though each individual session costs significantly more. Mode | Cost per session | Scale users × freq | True budget risk | | M0–M1 | Low | Massive | 🔴 Primary risk | | M2 | Medium | Moderate | 🟡 Manageable | | M3 | High | Minimal | 🟢 Contained | | M4 | None | Irrelevant | ✅ No risk | Table 4. The budget risk paradox. The activities that consume the most tokens per session carry the least organizational budget risk. The activities that consume the least tokens per session carry the most. This has direct implications for how organizations structure their AI governance. Cost controls applied uniformly across all AI usage — token caps, usage limits, model downgrades — will disproportionately affect M3 users, who are typically the professionals generating the highest-value outputs, while leaving largely untouched the M0–M1 volume that drives the actual budget exposure. Effective AI cost governance requires mode-aware controls: different token budgets, model tiers, and usage policies calibrated to the M mode of the activity, not to the role title of the user. This article is published as part of the Foundry Expert Contributor Network. Want to join? https://www.cio.com/expert-contributor-network/