cd /news/large-language-models/why-current-llm-costs-are-not-sustai… · home topics large-language-models article
[ARTICLE · art-40495] src=aditya.patadia.org ↗ pub= topic=large-language-models verified=true sentiment=↓ negative

Why current LLM costs are not sustainable

Uber burned through its entire year's AI budget in just four months, while Microsoft, Salesforce, and GitHub are cutting AI spending, highlighting unsustainable costs of frontier models like GPT 5.5 at $30 per million output tokens. Model performance plateaus, open-weight models like GLM-5.2 at 1/10th the cost, chip improvements, and zero switching costs are pressuring AI labs to lower prices.

read5 min views1 publishedJun 26, 2026
Why current LLM costs are not sustainable
Image: source

AI has a cost problem. The solution that will emerge will be simpler than we expect.

A lot of companies are getting bitten by high AI costs. Uber burned through the entire year’s AI budget in just 4 months and Microsoft, Salesforce and Github are taking steps to reduce AI spend by employees.

On the other hand, AI is making many programming tasks very easy and also keeps helping in other domains like data interpretation, making beautiful slides and designing apps and websites. Currently, big AI labs have what we call frontier models and those models perform exceptionally well for a wide variety of tasks. Frontier AI labs are doing research and hosting both on their own and hence, the costs of those models are the highest. GPT 5.5, for example, costs $5 per million input tokens and $30 per million output tokens. This is currently the costliest model available as per OpenRouter. To give an example, just doing Typescript type fixes with this model across 50 files cost me $54 this afternoon.

Model performance plateau, Open weight model releases, Chip and model improvements, Zero switching costs and local models are the reasons the AI labs might not be able to sustain the high price that they are asking right now.

Model performance plateau #

We are seeing improvements with each model release these days but it’s clear that the improvements are getting smaller and smaller. Unless a completely new breakthrough is invented, current learning and inference capabilities can only scale so much. There is a problem of training data as well. Most AI labs have likely ingested everything available in digital and print media for the model training. Improving the training dataset is going to prove very difficult.

This means the continuing trend of hikes in model price due to better performance is not going to be easy. We saw evidence of it where Claude Opus 4.8 costs the same as Claude Opus 4.7. Once models stop improving big time and the training data and methods are similar, the model prices will likely drop due to competition.

Open weight models #

OpenAI had a massive lead when they launched ChatGPT in 2022 but slowly that lead is fading and we saw Anthropic take top spot in 2025-26. Now models like GLM-5.2 which is an open-weight model, beat GPT and Opus in coding benchmarks. That model has a 1/10th cost compared to GPT 5.5.

What is happening here is that leading AI labs are charging not only for inference but also for research in model architecture, training data collection and curation, model training cost (which can be tens or even hundreds of millions of dollars), paying their employees and recovering the marketing costs.

On the other hand, once an open weight model is released, any inference provider can easily host it and just do some markup on inference cost. This proves way cheaper than running a frontier AI lab.

Chip and model improvements #

Companies like Cerebras, Groq, Google and many other companies have realised that AI needs its own silicon and normal GPUs are not cutting it. Specialised chips are very expensive to design but once the architecture is ready, making millions of them is easy and inference cost becomes much cheaper. A TPU for example can be 30-70% cheaper than an Nvidia H100 GPU. Such advancements will keep coming and keep dropping the price per token.

Model architecture is also evolving. We saw caching as a basic improvement and now MoE models and other approaches are making models faster while keeping the same accuracy levels.

Zero switching costs #

Traditional Software like Windows OS, MS Office, Adobe Suite and SaaS like Salesforce, Hubspot, and Figma had a very important moat that AI models don’t have. Every single software that was built was not interchangeable. You could not swap a CRM in an afternoon; it took months.

When more AI labs enter the space and more open weight models are available, this factor is going to be responsible for a very quick price crash. AI gateway providers like OpenRouter.ai are making it extremely easy to switch models. It can happen in seconds and in fact, we can program it to change providers on the fly. Zero switching costs mean that if a better model comes along, consumers can switch to it without any time investment.

Local models #

Last but not least and in fact the most important factor, is the ability of users to run local models. So far, almost everyone is using cloud-hosted models and local models are either too big to deploy or too slow to work with. With advancements in chips, this will change in 4-5 years’ time. Newer chips will run models locally and almost certain crash in RAM prices will make it easy to deploy models on computers and smartphones. I predict most operating systems will provide a way to deploy a model and they will also provide an interface so apps running locally can connect to the model.

When this happens, cloud models will only be used for the most complex of the tasks and simple tasks like code tab completion, proofreading and fact checking will be done locally. This means customers will no longer need that $20 or $200 subscription.

Closing thoughts

This is my first blog on a personal level and I have made some bold predictions here. Only time will tell how they turn out but one thing is certain. The price pressure will come due to one or more reasons listed above and in the end, it’s all good for consumers.

── more in #large-language-models 4 stories · sorted by recency
── more on @uber 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-current-llm-cost…] indexed:0 read:5min 2026-06-26 ·