# I Tested the Viral “Caveman” AI Trick. Here’s What It Actually Saves (And What It Doesn’t)

> Source: <https://pub.towardsai.net/i-tested-the-viral-caveman-ai-trick-heres-what-it-actually-saves-and-what-it-doesn-t-83aa5f0a093c?source=rss----98111c9905da---4>
> Published: 2026-06-19 06:41:24+00:00

A free GitHub tool called **Caveman** has been blowing up. It claims to cut your Claude Code token usage by 75% just by making the AI talk like a caveman. It has over 51,000 GitHub stars, a viral Reddit post with roughly 10,000 upvotes, and coverage from a dozen blogs.

Depending on which article you read, it either changed everything or barely moved the needle.

Both are sort of true — and that’s the actual story nobody puts in one place. So I dug into the real, independently tested numbers, compared them against Caveman’s own marketing, and looked at what actually saves teams money on AI API costs in 2026.

Here’s the honest breakdown.

Caveman is a free, open-source skill for Claude Code (it also works with Codex, Gemini, Cursor, and Windsurf), created by independent developer Julius Brussee and released in April 2026. It instructs the AI to respond in a stripped-down, compressed style — dropping articles, filler words, and hedging language while keeping every technical detail accurate.

A normal response might read: *“I’d be happy to help you with that. The issue you’re experiencing is most likely caused by your authentication middleware not properly validating the token expiry.”*

Caveman compresses that to: *“Auth middleware no validate token expiry. That cause.”*

It’s genuinely clever. It’s also, per multiple independent tests, not the dramatic cost-cutting silver bullet the headline number implies.

This is the part most “Caveman review” posts skip. Here’s what independent testing actually found, compared directly against the marketing claim:

The pattern is consistent across every test: the 65–75% number is real, but it only applies to the conversational slice of an AI’s output — not the code it generates, not its internal reasoning, and not your input tokens at all. In a typical coding session, that conversational slice is a minority of total spend.

That’s why GrowwStacks measured zero difference on pure coding tasks, while Pillitteri — testing chat-heavy workflows alongside coding — found a real but modest 4–10% overall reduction.

**Bottom line:** know what kind of work your team does before expecting a specific number. Chat-heavy workflows benefit most; pure code generation barely moves.

It’s free, MIT-licensed, and takes under two minutes:

```
# Install via npx (no permanent install needed)npx caveman-skill install
# Or clone directly from GitHubgit clone https://github.com/JuliusBrussee/caveman.git
```

Activate it in any session with /caveman. Three intensity levels are available:

If you have junior developers still learning the codebase, Lite (or skipping Caveman entirely) is the better call — the explanatory tokens it strips are often the actual learning content, not waste.

While Caveman gets the headlines, prompt caching quietly delivers far bigger savings for almost every team — and it’s a parameter change, not a new tool.

Every API call you send includes a system prompt and often repeated context (a CLAUDE.md file, reference docs, etc.). Without caching, you pay full input price for that repeated content on *every single call* — even though it hasn’t changed. With caching enabled, the provider stores that content and charges a steeply discounted rate on subsequent hits — typically **75–90% off** the standard input rate. For Claude models, cached tokens can cost as little as a tenth of the standard input price.

It requires almost no engineering effort. It activates automatically once your prompt prefix exceeds roughly 1,024 tokens and has been used recently — you’ll see cached_tokens show up in your API usage data once it kicks in.

This is the single biggest lever most teams haven’t pulled, and it doesn’t require a new tool — just a routing decision in your application logic.

**The core insight:** sending every request to a flagship model when 70–80% of tasks could be handled by a lightweight model is the most common source of overspending. Teams that implement intelligent routing typically cut API costs by **50–70%** without noticeable quality loss.

The price gap makes this obvious: running an identical 1-million-token workload on a flagship model can cost roughly 30x more on output than running it on a budget-tier model — and over 100x more during promotional pricing windows on the cheaper option.

**A practical routing framework:**

You don’t need a sophisticated classifier to start — a rough rule based on prompt length or a keyword check on task type captures most of the win.

None of these three tactics compete — they stack:

**My honest ranking, in order of what to implement first:**

Pricing shifts constantly, but the relative gaps between tiers have held steady through 2026:

The gap between cheapest and most expensive sits around 20–30x per token — which is exactly where model routing earns its keep. *(Always verify current pricing directly with the provider — these numbers shift without notice.)*

Caveman is fun, free, and genuinely effective on the slice of your output it’s designed for. But if you’re optimizing AI spend and can only do one thing, prompt caching beats it by a wide margin for almost no effort. Add model routing, and you’re looking at a realistic 70–85% reduction in total AI API costs — not a marketing number, but something documented teams have actually hit in 2026.

Caveman is the fun bonus on top. Caching and routing are the real story.

*I write about AI agents, agentic automation, and the real-world economics of running AI at scale — experiments, numbers, and what actually works. Follow along if that’s your thing.*

[I Tested the Viral “Caveman” AI Trick. Here’s What It Actually Saves (And What It Doesn’t)](https://pub.towardsai.net/i-tested-the-viral-caveman-ai-trick-heres-what-it-actually-saves-and-what-it-doesn-t-83aa5f0a093c) was originally published in [Towards AI](https://pub.towardsai.net) on Medium, where people are continuing the conversation by highlighting and responding to this story.