cd /news/large-language-models/why-enterprises-are-making-ai-talk-l… · home topics large-language-models article
[ARTICLE · art-45174] src=gadgetreview.com ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Why Enterprises Are Making AI Talk Like a Caveman To Cut Costs

Uber burned through its annual AI budget in four months, prompting usage caps and a shift to per-token billing for coding assistants like GitHub Copilot. Developer Julius Brussee created 'caveman,' a plugin that reduces AI output tokens by up to 75% by stripping polite language, as companies seek to cut soaring costs from verbose AI responses.

read3 min views1 publishedJun 30, 2026
Why Enterprises Are Making AI Talk Like a Caveman To Cut Costs
Image: Gadgetreview (auto-discovered)

Enterprise AI coding assistants are being rationed — not because they failed, but because they succeeded too well. Uber reportedly burned through its entire annual AI budget in four months, according to 404 Media. Walmart introduced usage caps. GitHub Copilot Business shifted from flat subscriptions to per-token billing. Tokens — roughly three-quarters of a word each — are the billing unit for every LLM interaction. Input and output both count. Every “Certainly, happy to help!” from Claude costs real money at scale. Enter caveman. The scale of investment driving these pressures is underscored by the Stargate Project, a $500 billion AI infrastructure initiative that signals just how much is at stake.

Brain Still Big. Mouth Small. #

A lightweight plugin strips AI pleasantries while preserving every line of code.

Developer Julius Brussee built caveman after noticing how much token spend disappeared into hedging, transitions, and chatbot politeness inside agent loops. The tool is a simple markdown config file compatible with Claude Code, Codex, Gemini, and over 30 other coding agents — installed with a single command (npx skills add JuliusBrussee/caveman

). “It makes the model speak less like a polite chatbot and more like a terse tool,” Brussee told 404 Media. “Same substance, fewer words.”

The numbers hold up — partially. Brussee’s tests showed 65–75% output token reduction versus default verbose output. Elastic Labs independently measured 63.6% average reduction across eight Elasticsearch scenarios with zero accuracy loss. A separate technical walkthrough found roughly 45% output savings and approximately 39% cost reduction. The honest caveat: one deeper analysis found that in typical coding sessions, prose accounts for a small fraction of total tokens, so real-world session savings may land closer to 4–5%.

When “Please” Costs Tens of Millions #

The token bill problem extends far beyond chatty AI responses.

Sam Altman has noted that users typing “please” and “thank you” into LLMs collectively costs OpenAI tens of millions in electricity. Legrand, an electrical and data center infrastructure company, distributed an internal memo — obtained by 404 Media — explicitly listing caveman as one of four high-impact cost practices, alongside avoiding powerful models and high reasoning settings by default. Uber’s CTO capped employee AI usage after that four-month budget blowout. Walmart followed with its own restrictions.

Critics rightly point out that output tokens are often the smaller cost driver. Long input contexts, bloated prompt histories, and agent loops burning tokens in the background do more damage. Structural fixes matter more:

Prompt pruning****RAG(injecting only relevant data instead of entire databases)- Small-model routing for intake tasks

  • Token caching at roughly 10% of standard input price

Caveman is useful. It is not a budget strategy on its own. Teams looking for broader efficiency gains may also benefit from exploring AI-Powered Websites that complement these cost-saving approaches.

Something telling is happening regardless. OpenAI’s director of engineering Shayne Sweeney contributed Codex plugin support directly to the caveman repository. Engineers at Nvidia and GitHub are reportedly experimenting with it. Formal AI style guides specifying token budgets per workflow — and a new specialty called “ token economist“ — may arrive sooner than anyone planned.

── more in #large-language-models 4 stories · sorted by recency
── more on @uber 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/why-enterprises-are-…] indexed:0 read:3min 2026-06-30 ·