Why Enterprises Are Making AI Talk Like a Caveman To Cut Costs

wpnews.pro

cd /news/large-language-models/why-enterprises-are-making-ai-talk-l… · home › topics › large-language-models › article

[ARTICLE · art-45174] src=gadgetreview.com ↗ pub=2026-06-30T16:24Z topic=large-language-models verified=true sentiment=· neutral

Why Enterprises Are Making AI Talk Like a Caveman To Cut Costs

Uber burned through its annual AI budget in four months, prompting usage caps and a shift to per-token billing for coding assistants like GitHub Copilot. Developer Julius Brussee created 'caveman,' a plugin that reduces AI output tokens by up to 75% by stripping polite language, as companies seek to cut soaring costs from verbose AI responses.

read3 min views1 publishedJun 30, 2026

Why Enterprises Are Making AI Talk Like a Caveman To Cut Costs — Image: Gadgetreview (auto-discovered)

Enterprise AI coding assistants are being rationed — not because they failed, but because they succeeded too well. Uber reportedly burned through its entire annual AI budget in four months, according to 404 Media. Walmart introduced usage caps. GitHub Copilot Business shifted from flat subscriptions to per-token billing. Tokens — roughly three-quarters of a word each — are the billing unit for every LLM interaction. Input and output both count. Every “Certainly, happy to help!” from Claude costs real money at scale. Enter caveman. The scale of investment driving these pressures is underscored by the Stargate Project, a $500 billion AI infrastructure initiative that signals just how much is at stake.

Brain Still Big. Mouth Small. #

A lightweight plugin strips AI pleasantries while preserving every line of code.

Developer Julius Brussee built caveman after noticing how much token spend disappeared into hedging, transitions, and chatbot politeness inside agent loops. The tool is a simple markdown config file compatible with Claude Code, Codex, Gemini, and over 30 other coding agents — installed with a single command (npx skills add JuliusBrussee/caveman

). “It makes the model speak less like a polite chatbot and more like a terse tool,” Brussee told 404 Media. “Same substance, fewer words.”

The numbers hold up — partially. Brussee’s tests showed 65–75% output token reduction versus default verbose output. Elastic Labs independently measured 63.6% average reduction across eight Elasticsearch scenarios with zero accuracy loss. A separate technical walkthrough found roughly 45% output savings and approximately 39% cost reduction. The honest caveat: one deeper analysis found that in typical coding sessions, prose accounts for a small fraction of total tokens, so real-world session savings may land closer to 4–5%.

When “Please” Costs Tens of Millions #

The token bill problem extends far beyond chatty AI responses.

Sam Altman has noted that users typing “please” and “thank you” into LLMs collectively costs OpenAI tens of millions in electricity. Legrand, an electrical and data center infrastructure company, distributed an internal memo — obtained by 404 Media — explicitly listing caveman as one of four high-impact cost practices, alongside avoiding powerful models and high reasoning settings by default. Uber’s CTO capped employee AI usage after that four-month budget blowout. Walmart followed with its own restrictions.

Critics rightly point out that output tokens are often the smaller cost driver. Long input contexts, bloated prompt histories, and agent loops burning tokens in the background do more damage. Structural fixes matter more:

Prompt pruning****RAG(injecting only relevant data instead of entire databases)- Small-model routing for intake tasks

Token caching at roughly 10% of standard input price

Caveman is useful. It is not a budget strategy on its own. Teams looking for broader efficiency gains may also benefit from exploring AI-Powered Websites that complement these cost-saving approaches.

Something telling is happening regardless. OpenAI’s director of engineering Shayne Sweeney contributed Codex plugin support directly to the caveman repository. Engineers at Nvidia and GitHub are reportedly experimenting with it. Formal AI style guides specifying token budgets per workflow — and a new specialty called “ token economist“ — may arrive sooner than anyone planned.

source & further reading

gadgetreview.com — original article Google Makes Gemini Personalized Image Generation Free – And Wants Your Data in Return PlayStation 6 Could Cost You $1,000. Here’s Why That Number Keeps Climbing Two-Thirds of Shoppers Have Left Brands They Once Loved, Accelerated By AI Tools

~/api · this article 200

$curl api.wpnews.pro/v1/news/why-enterprises-are-maki…

Read original on gadgetreview.com → www.gadgetreview.com/why-enterprises-are-making-…

mentioned entities

Uber

Walmart

GitHub

Claude

Julius Brussee

OpenAI

Sam Altman

Elastic

metadata

slugwhy-enterprises-are-making-ai-talk-like-a-caveman-to-cut-costs

topic#large-language-models

secondary2 topics

sentimentneutral

canonicalgadgetreview.com

navigation

← prevAnthropic releases Claude Deskto…

next →Detecting Which AI Chat Platform…

── more in #large-language-models 4 stories · sorted by recency

github.com · 30 Jun · #large-language-models

Show HN: Paneflow – cross-platform GPUI app for parallel coding agents

vanta.shanev.ai · 30 Jun · #large-language-models

Show HN: Vanta – Markdown notes for iPhone with on-device AI

startupfortune.com · 30 Jun · #large-language-models

X launches an official MCP server and every social platform will need to follow

github.com · 30 Jun · #large-language-models

Show HN: fenic – LLMs as dataframe operators, query meaning and structure

── more on @uber 3 stories trending now

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 28 May · #ai-startups

The Niche SaaS Opportunity Map 2026: Highly Demanded Subscribed Categories Beyond Mainstream

wpnews · 29 Jun · #large-language-models

The Silent Cost of AI Agents: Why Your Next.js SaaS Is Burning Money on LLM Calls

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required