The agent always cost this much

GitHub moved Copilot from Premium Request Units to AI Credits this week, and developers reported on Reddit that a single Claude Sonnet prompt consumed half a month of credits and a $100 Max plan could be exhausted within a day. GitHub Chief Product Officer Mario Rodriguez stated that the previous flat-rate model was no longer sustainable due to escalating inference costs, as the company had been absorbing the expense of unbounded usage. The shift reflects a broader industry convergence on usage-based pricing, driven by the enormous capital costs of frontier models and the reality that no flat fee can sustain unlimited access to expensive inference.

GitHub moved Copilot from Premium Request Units to AI Credits this week, and developers on Reddit were reporting within hours that a single Claude Sonnet prompt had burned half a month of credits, and that a $100 Max plan could be exhausted inside a day. The clearest framing of why came from Mario Rodriguez, GitHub's Chief Product Officer, writing on the GitHub blog https://github.blog/news-insights/company-news/github-copilot-is-moving-to-usage-based-billing/ : Today, a quick chat question and a multi-hour autonomous coding session can cost the user the same amount. GitHub has absorbed much of the escalating inference cost behind that usage, but the current premium request model is no longer sustainable. The capital outlay behind frontier models is enormous, running to multi-billion-dollar training runs, inference fleets on the most expensive GPUs. I have written before about whether the math even works https://www.poppastring.com/blog/does-the-math-work at the macro level, and the answer at trillion-dollar CapEx scale is no. Charging ten dollars a month for unbounded access to that supply chain was never sustainable. Every serious AI tooling vendor is converging on usage-based pricing for the same reason: basic capitalism. Frontier inference is expensive, parallel agents multiply the expense, and no flat fee survives that kind of unbounded scale. Developers are learning what it took cloud engineers a good 10 years to accept: flat-rate predictability is the last thing to arrive, not the first. The pattern emerging seems to be Frontier at the edges, cheap in the middle . Planning is where the frontier models earn their cost. Letting a strong model think hard about what the work actually is, before any code gets written, prevents the kind of misdirected implementation that becomes expensive. The implementation itself, once the plan is established, is largely mechanical, and a cheaper model can do most of the hard labor. Verification can then swing back to the frontier. Underneath all of that, the ambient AI in the IDE continues to run on cheaper models, handling Next Edit Suggestions https://devblogs.microsoft.com/visualstudio/next-edit-suggestions-available-in-visual-studio-github-copilot/ , whole-line completions https://devblogs.microsoft.com/visualstudio/typing-less-coding-more-how-we-delivered-intellicode-whole-line-completions-with-a-transformer-model/ , call-stack analysis https://devblogs.microsoft.com/visualstudio/visual-studio-2026-debugging-with-copilot/ , and the constant background help that makes the editor feel intelligent. These calls are high-frequency, latency-sensitive, and low-stakes per individual invocation, and that cheap floor may get even cheaper. At BUILD this week https://www.youtube.com/live/FFMm454fxNA , Satya Nadella pointed out that "the amount of compute you have at the edge is actually astounding," and framed the destination as "unmetered intelligence to every desk and every home." The Windows side of the answer is real on-device inference support https://learn.microsoft.com/en-us/windows/ai/apis/ , NPU acceleration across the major silicon vendors, and small-model families tuned to run on the chip rather than the cloud. The direction is hybrid by default: route what needs the frontier to a paid API, run what can run locally on silicon the developer already owns. The era of agents running wild on someone else's dime is over. What replaces it is already visible in outline, and it rewards developers who think before they fan out, match the model and where it runs to the task, and treat context as a resource worth caching. I work on Visual Studio at Microsoft. The views here are mine alone.