# Why token panic is reshaping AI

> Source: <https://www.thedeepview.com/articles/why-token-panic-is-reshaping-ai>
> Published: 2026-06-16 22:40:19+00:00

ake no mistake, enterprises are panicking about token costs.

At Databricks' Data + AI Summit, I talked to executives at various enterprise organizations and they all expressed that AI inference costs have run so far over budget in recent months that it's precipitating a crisis.

These are not naturally dramatic people, but they told me things like, "Our CFO is going to lose it when inference costs come in," and "The coding agents have run our token bills out of control," and "Last year, the executives were letting every flower bloom; now they're coming in with a lawnmower."

At the event's opening keynote, Databricks CEO Ali Ghodsi addressed the elephant in the room. "It's completely unsustainable for the organizations out there," he said.

Later, in a briefing with the press, Ghodsi said, "It's the number one thing we're getting asked: 'how do we curb the cost but still invest in ai?'"

As a result, two things are quickly becoming true:

**Model selection matters**: There are a lot of queries and workloads getting sent to frontier models in the cloud (ChatGPT, Claude, and Gemini) when they could easily be handled by small models, domain-specific models, and open models. Using frontier models for simple questions and tasks is like using a chainsaw to cut down a daisy.**Hybrid compute will be part of the answer**: Running open models locally on your own hardware is the other way to massively save on inference costs. The challenge is that you have to calculate and shape query traffic to make sure you're taking full advantage of the capacity that you build out, rather than just paying per token.

This is bad news for OpenAI and Anthropic, who have been the biggest beneficiaries of [tokenmaxxing](https://www.thedeepview.com/articles/why-ai-s-tokenmaxxing-obsession-ran-out-of-steam). Anthropic has especially seen its revenues soar as enterprises invested heavily in granting unlimited token access to its employees.

Unsurprisingly, Databricks offered its solution to help enterprises wrestle the token problem to the ground: [Unity AI Gateway](https://www.databricks.com/product/artificial-intelligence/unity-ai-gateway). The platform provides visibility into how many tokens your organization is spending, before you get a giant bill. It also provides observability into what the agents are doing and, most importantly, the ability to route tasks and queries to the most appropriate model or lab.

Databricks also launched [Omnigent](https://www.databricks.com/blog/introducing-omnigent-meta-harness-combine-control-and-share-your-agents), which it describes as a harness for harnesses, to help enterprises get better results with Claude Code, Codex, Cursor, and other agents. And it launched its own enterprise agent platform called [Genie One](https://www.databricks.com/company/newsroom/press-releases/databricks-launches-genie-one-all-new-agentic-coworker-every-team).

## Our Deeper *View*

The Deep View is hearing from nearly every direction right now that [enterprises are becoming deeply alarmed](https://www.thedeepview.com/articles/the-tokenmaxxing-era-is-over-before-it-started) about token spend and interference costs. It's quite a turnaround from a year ago, when CEOs were still desperately trying to get their employees to use AI. Now, the employees and their coding agents are using it so much that the economics are falling apart. The bottom line is there's a lot of work that needs to be done to make AI more efficient, to optimize workloads, and to make routing tasks and queries a lot smarter and safer. Databricks is one of the vendors that wants to step in and help with that, but the list of companies lining up to help organizations control their agents is getting very long.
