AI customers may be starting to pinch their pennies, and tech giants are taking notice.
At both Microsoft Build and Snowflake Summit this week, efficiency stood out as a prevailing theme in the announcements of these enterprise tech giants. It may signal that the compute costs that are crunching AI builders are starting to add up, and flagrant spending fueled by sky-high expectations may be starting to come back to earth.
"I think if you read about OpenClaw's founder, Peter Steinberger, and how many millions of dollars worth of tokens that he's using, it doesn't necessarily correlate to an output," Rob Ferguson, VP of technology and strategy at Fireworks AI, told The Deep View this week. "People are starting to really think about what the outputs of their AI are."
In short, the era of "tokenmaxxing" may be over. Or, at least, the definition is changing, said Ferguson. Rather than focusing on eating up as many tokens as their competitors, enterprises are starting to think about how to squeeze as much as they can out of the tokens they use.
Several of the product releases in San Francisco this week back up that shift:
Snowflake'snew Cortex Training system, which allows enterprises to customize open-weight foundation models, is marketed specifically as being faster and less expensive. Additionally, Snowflake's new Adaptive Compute addresses cost efficiency at the infrastructure level by automatically calculating the best use of compute and software resources in real time.Microsoft's new modelsalso reflect a desire for efficiency, with its first reasoning model sitting at 35 billion parameters (compared to the latest trillion-parameter models that OpenAI and Anthropic offer) and built specifically for efficiency and low-token cost.- The company is even targeting efficiency on the hardware side, debuting both the Surface Laptop Ultraandthe Surface RTX Spark Dev Box, which can run powerful models locally and drastically reduce token costs. Jatinder Mann, partner director of product management at Microsoft, told The Deep View that these devices aim to provide "unmetered intelligence," reducing cloud costs by enabling local models to handle routine tasks. "There are a lot of routine things that don't necessarily need a cloud model," Mann said.
The next step enterprises should take is questioning whether a task requires AI at all, Raj Ramanujam, VP of Global Alliances and Cloud at Dynatrace, told The Deep View. Every agentic task, every prompt, every tool call racks up the bill. It's why every potential AI implementation should start with a "problem statement," he said, identifying exactly what challenge they're trying to solve or task they'd like to automate.
"There are some things that you can automate without touching AI in the normal course of how you program it," said Ramanujam.
Our Deeper View #
The tokenmaxxing fad has grown to the point where memes are going viral about companies burning tokens on agents who write poetry and send motivational messages. But what goes up must come down. Seeking to cement their statuses as AI-first, many companies felt the pressure to go all-in on the tech without considering the costs, racking up massive bills with models running in the cloud, and those bills have clearly started to sting. And with the ROI equation still unanswered, some enterprises may be feeling uneasy about their AI strategies. If these announcements signal anything, it's that big tech firms know they, too, need to lean into efficiency, rather than pressuring their customers to use up as many tokens as possible (looking at you, Jensen).