Microsoft and peers curb AI use amid rising token costs

Microsoft is cancelling most direct licences for Anthropic's Claude Code within its Experiences and Devices division and ordering engineers to migrate to GitHub Copilot CLI by June 30, 2026, according to multiple outlets. Uber burned through its planned 2026 AI coding budget in roughly four months, with individual engineers spending $500 to $2,000 per month on tokens, per reports citing company executives. The moves follow industry reporting that agentic AI workflows can consume hundreds to a thousand times more tokens than standard LLM queries, turning token costs into a material line-item expense for major engineering organizations.

Microsoft and peers curb AI use amid rising token costs Reported coverage from multiple outlets says Microsoft is cancelling most direct licences for Claude Code inside its Experiences and Devices division and directing engineers to migrate to GitHub Copilot CLI by June 30, 2026 , according to TheStreet and The Next Web. Reporting also shows Uber exhausted its planned 2026 AI coding budget within months, a point attributed to CTO Praveen Neppalli Naga and confirmed by Uber COO Andrew Macdonald on the Rapid Response podcast, per India Today and The Next Web. Industry reporting links the problem to heavy token consumption by agentic workflows, which can use orders of magnitude more tokens than standard LLM queries, according to Tom's Hardware and Fortune. Editorial analysis: practitioners should treat licence and token consumption as material line-item costs, not just marginal usage. What happened Reported coverage from The Next Web, TheStreet, Fortune, India Today, and Tom's Hardware documents several corporate responses to rising AI runtime costs. According to TheStreet and The Next Web, Microsoft is cancelling most direct licences for Anthropic's Claude Code inside its Experiences and Devices division and has directed affected employees to migrate to GitHub Copilot CLI with a stated deadline of June 30, 2026 . TheNextWeb and Fortune cite reporting that Microsoft initially rolled out Claude Code companywide as an experiment before the licence pullback. Per The Next Web, Uber 's CTO Praveen Neppalli Naga told The Information that the company burned through its planned 2026 AI coding budget in roughly four months; India Today reports Uber COO Andrew Macdonald reconfirmed this on the Rapid Response podcast. TheNextWeb and Fortune report individual engineers at Uber were spending roughly $500 to $2,000 per month on tokens. Tom's Hardware and other outlets describe agentic AI workflows as dramatically increasing token consumption, in some cases by hundreds to a thousand times versus single-query LLM use. Editorial analysis - technical context Industry coverage frames the core technical driver as token-volume growth from more frequent, agentic, or multi-step workflows. Agentic agents that chain LLM calls, call tools, or maintain long interaction state naturally multiply token counts per task. Observed per-report numbers engineers burning hundreds to thousands of dollars monthly are consistent with high-frequency use of code-generation and agent patterns rather than occasional prompt queries. For practitioners, this means runtime billing exposure scales with workflow design: synchronous single-query completions and compact prompts cost far less than persistent agent loops or long-context retrieval plus multi-turn generation. Industry context Editorial analysis: Reporting places these corporate moves in a broader pattern where early internal experiments and liberal access to third-party models surface previously hidden operating costs. Multiple outlets connect licence rescindments or internal tool pushes to the mismatch between optimistic productivity expectations and real-world token consumption. Past assumptions that falling inference price per token would uniformly lower bills are encountering Jevons-paradox effects, where cheaper tokens and easier tools encourage heavier usage and higher total spend. Context and significance Editorial analysis: For AI/ML teams and engineering leaders, the immediate implication is that model selection, tool architecture, and access controls materially affect operating budgets. Organisations that permit unconstrained agentic workflows or delegate heavy code generation to every engineer can convert a modest per-query charge into a significant recurring expense. The Microsoft and Uber examples are notable because they come from large, well-resourced engineering organisations; smaller teams with tighter budgets will see these dynamics earlier and more acutely. What to watch Editorial analysis: Observers should track four indicators: licence and access changes from major cloud and vendor partners; internal chargeback or token-accounting schemes becoming standard; emergence of cheaper, more efficient agent runtimes or token-optimised models; and product changes that reduce per-task token counts e.g., caching, retrieval-augmented generation optimisations, or on-device inference . Also watch vendor commercial responses, such as tiered pricing for agentic patterns or enterprise plans with predictable spend controls. Bottom line Reported moves by Microsoft and the spending disclosures from Uber highlight that token economics are now a corporate governance issue. Editorial analysis: AI adoption decisions increasingly require the same budget discipline traditionally applied to cloud compute or SaaS spend, not just technical validation of models. Scoring Rationale This story matters to practitioners because it reframes AI adoption as an operational and budgetary challenge rather than just a technical one. The examples involve major engineering organisations and point to cost dynamics agentic token growth, licence controls that will affect deployment decisions and vendor negotiations. Practice with real Ride-Hailing data 90 SQL & Python problems · 15 industry datasets 250 free problems · No credit card See all Ride-Hailing problems /problems/datasets/mobility