{"slug": "ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data", "title": "AI coding getting pricier? I cut my tokens by 82% (with real data)", "summary": "A developer cut token usage by 82% across over 6,000 commands, saving 7.4 million tokens, by trimming the CLAUDE.md rules file, using automatic compression plugins (RTK, claude-mem, codegraph), and tiering models for grunt work versus complex tasks.", "body_md": "📝 Originally on my blog →\n\n[https://kanfu-panda.github.io/blog/2026/06/17/cut-tokens-82.html]\n\nLast time I said: saving tokens isn't about cutting docs, it's about using your tools right. Someone followed up: so how exactly do you use them right?\n\nThis one's hands-on, with real numbers. Here's the headline figure: I checked my local `rtk gain`\n\n—a tool that tracks token savings—and across six thousand-plus commands, it's saved **7.4 million tokens, 82%**. Not an estimate. It logged them one by one.\n\nSo let me break it down: how that 82% gets saved.\n\nFirst, where the saving happens.\n\nThe bulk of token spend isn't in \"how much work you do\"—it's in how much you stuff into the AI's context each turn. The model recomputes the entire context every turn; the fatter the context, the more expensive each turn.\n\nSo the core is one sentence: keep what enters the context as small and as lean as possible.\n\nI've got three levers: trim the rules file, use the right plugins, tier your models. They share one thing—**they all save before things hit the context, not by making you do less work**. Let's take them one at a time.\n\nThe most overlooked—and the one you should do first—is trimming your `CLAUDE.md`\n\n.\n\nCLAUDE.md (rules file, instruction file, whatever you call it) gets stuffed into the context every single conversation. It's always resident. Every line you write, you re-pay in tokens every turn.\n\nMy own CLAUDE.md was once long-winded—from user level to project level, packed with reminders. Looking back, it was full of repeated nagging, stale conventions, and a pile of \"might as well not have written it\" filler. I cut it down by nearly half, keeping only the hard rules I actually use every time. Bottom line: nagging the same point three times won't make the AI more obedient, it just costs more tokens each turn.\n\nThat one move saves every turn. Because it's resident, you save not once, but every time after.\n\nConversation context is the same: a window grown to tens of thousands of tokens—clear it when you should, don't drag the morning's stuff into the evening to be recomputed every turn.\n\nManual only goes so far. I've installed a few plugins that compress the context automatically. The data speaks.\n\n**RTK** (Rust Token Killer)—a command proxy. When you have the AI run `git status`\n\n, `ps aux`\n\n, or tests, those outputs run hundreds or thousands of lines, and stuffing them in whole is brutally expensive. RTK compresses them before they reach the AI. My `rtk gain`\n\n: six thousand-plus commands, 7.4M tokens saved, 82%. The biggest wins are the high-frequency, low-nutrition outputs—`ps aux`\n\n's hundreds of lines of process list, which the AI gains nothing from reading, saved 99%; test logs 88%; even file reads average 20% off.\n\n**claude-mem**—a memory plugin. It compresses cross-session work into structured memory, so you don't re-explain the project background next time. Measured 86% savings this session. Fully automatic, I barely touch it.\n\n**codegraph**—a code graph. It builds an index of the project's functions, types, and call relationships. When the AI needs a function, it queries the index instead of reading a pile of files. In my aitm project it indexed **246 files, 3562 symbols**. \"Query the index\" vs. \"read 246 files cover to cover\"—the difference isn't small; the former is like flipping to a book's table of contents, the latter like memorizing the whole book to answer one question.\n\nThese three share: automatic, resident, saving before things hit the context. Install them and you mostly forget they're there—they just keep saving for you.\n\nLast one: model tiering.\n\nGrunt work—exploring, searching, reading files—goes to a cheap small model; only the real thinking, writing code and making judgments, gets the top tier. Especially when dispatching subagents—one task split into several, the grunt-work ones on small models. This is the main battlefield for saving quota.\n\nI wrote this judgment standard straight into CLAUDE.md, so the AI tiers itself each time without me spelling it out.\n\nThis isn't limited to Claude Code either. On any AI platform the logic holds: know each model's capability and price, use the right tier for the job, spend the expensive compute where it counts.\n\nThe three levers above all \"reduce the amount entering the context.\" There's one more, different in kind—prompt caching. It doesn't reduce the amount; it gets the repeated parts billed at a discount.\n\nSystem prompts, unchanging rules files, fixed project background—the stuff that's identical every turn—pays full price the first time, then gets discounted on cache hits. And it's not a linear discount; used well, the savings are noticeable.\n\nThe trick is not to let the cacheable parts keep changing: put the fixed, unchanging stuff at the front of the context and keep it stable, the per-turn variable stuff at the back. The more stable the structure, the higher the cache-hit rate, the fuller the discount.\n\nI don't have RTK-style measured numbers for this one (it saves on the billing side, not on token count), but the principle is simple and the cost near zero—worth using as a matter of course.\n\nGot to state the costs too, or it turns into a promo piece.\n\ncodegraph has to build the index first, which takes time on a big project; claude-mem's memory occasionally recalls things a bit off, so keep an eye out; trimming CLAUDE.md has a limit too—compress away the hard rules you actually need every time, and the AI drifts and reworks, which is penny-wise and pound-foolish.\n\nAnd don't mistake \"saving tokens\" for doing less work. Quite the opposite—it cuts the waste that should've been cut: repeated context, reading the whole repo, a cannon for a mosquito. The work that needs doing still gets done.\n\nWhat the savings mean depends on how you're billed (covered last time): a flat monthly plan saves quota headroom; pay-as-you-go saves actual cash. I use both, so these methods are a double saving for me.\n\nBack to that 82%. It's no magic trick—it's piled up from the small things above: trim the rules file, install a few automatic plugins, tier your models. Each looks minor alone; stacked together, it's 7.4 million tokens saved across six thousand-plus commands.\n\nTwo things you can do today, ten minutes to start:\n\nOne, open your CLAUDE.md and delete the repeated, the stale, the might-as-well-not-have-written—see how many lines you can cut it to.\n\nTwo, install RTK, run it a few days, and look at what its `gain`\n\nsaved you—that number will probably make you do a double take.\n\nThat's all on saving tokens for now. Next I'm thinking of digging into model tiering: how to judge which model does which job, how to write CLAUDE.md so the AI tiers itself. And the details of context management—when to clear, how to read files precisely. If you're interested, let me know in the comments.", "url": "https://wpnews.pro/news/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data", "canonical_source": "https://dev.to/kanfu-panda/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data-2hfi", "published_at": "2026-06-20 08:44:23+00:00", "updated_at": "2026-06-20 09:07:02.961066+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-tools", "ai-infrastructure"], "entities": ["RTK", "claude-mem", "codegraph", "CLAUDE.md", "Rust Token Killer"], "alternates": {"html": "https://wpnews.pro/news/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data", "markdown": "https://wpnews.pro/news/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data.md", "text": "https://wpnews.pro/news/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data.txt", "jsonld": "https://wpnews.pro/news/ai-coding-getting-pricier-i-cut-my-tokens-by-82-with-real-data.jsonld"}}