{"slug": "9-ways-to-cut-token-consumption-in-claude-code", "title": "9 Ways to Cut Token Consumption in Claude Code", "summary": "A developer has identified nine strategies to reduce token consumption in Claude Code, emphasizing that excessive token use is a workflow problem rather than a pricing issue. The engineer's approach centers on keeping Claude's active context small and clean through techniques such as using filtered command wrappers instead of raw logs, matching models to task complexity (Sonnet for general work, Opus for deep reasoning, Haiku for simple tasks), and employing `/compact` and `/clear` with handoff files to prevent context pollution from old attempts and irrelevant data.", "body_md": "After using Claude Code for serious coding work, the biggest lesson is simple: token consumption is not mainly a pricing issue. It is a workflow problem.\n\nWhen my Claude Code session gets messy, performance drops. It starts rereading irrelevant files, remembering failed attempts, carrying old logs, and wasting context on things that no longer matter.\n\nSo my workflow is built around one principle:\n\nKeep Claude's active context small, clean, and useful.\n\nThese are the 9 best ways to reduce token consumption in Claude Code that I use every day and can probably help you too.\n\nDon't send raw logs or full test results. Use filters and simple command wrappers to show only what matters: the error, the stack trace, the key details.\n\nSonnet handles most coding work fine and costs less. Use Opus for the really hard problems. For helper tasks and exploration, use Haiku.\n\nDon't run expensive models on simple edits, and turn off extended thinking when you don't need deep reasoning.\n\nUse `/compact`\n\nduring a cleanup task while keeping what's important.\n\nUse /clear with a handoff file when switching to something completely different. This doesn’t let old failed attempts pile up in context memory.\n\nOnly put instructions that Claude needs most of the time (how to run tests, build commands, folder structure, main rules). Keep everything else in separate skill files and only load them when you actually need them, saving your tokens.\n\nUse Composio MCP to manage 1000+ tools as a single system instead of juggling 30+ active servers. When tasks are simple enough to use shell commands, use the composio CLI.\n\nSo I avoid this:\n\n```\nnpm test\n```\n\nAnd use filtered commands:\n\n```\nnpm test2>&1 |grep-A5-E\"FAIL|ERROR|Error|Expected|Received\" | head-100\n```\n\nIn my ideal setup, I create small wrappers like below and use them when needed:\n\n```\ncc-test\ncc-lint\ncc-typecheck\ncc-log\ncc-ci\n```\n\nBottom line here is, I let Claude consume summaries, not terminal noise.\n\nThis one change saves the most tokens because it prevents context pollution before it even starts.\n\nMy setup:\n\n```\n/model sonnet   → default for most coding work\n/model opus     → complex architecture, deep reasoning, tricky bugs\n/model haiku    → repetitive tasks, boilerplate, quick lookups\n```\n\nI switch to Opus when I genuinely need deeper reasoning, and switch back as soon as the hard part is done.\n\nFor subagents specifically, I set:\n\n```\nCLAUDE_CODE_SUBAGENT_MODEL=haiku\n```\n\nThis means exploration agents, log inspectors, and doc-lookup agents all run on Haiku. The main thread stays on Sonnet.\n\nExtended thinking is another hidden cost - it burns output tokens for internal reasoning. For simple edits, I disable it:\n\n```\nMAX_THINKING_TOKENS=0      → trivial tasks\nMAX_THINKING_TOKENS=10000  → architectural reasoning\n```\n\nIn fact, this approach is so effective that it rarely lets my context memory fill up before it resets the 5-hour limit. Basically unlimited usage!\n\nBottom line: Using Opus for everything is like running every database query on your most expensive production server. Match the model to the work.\n\n`/clear`\n\n+ `/compact`\n\n+ `handoffs`\n\nAfter enough turns, the context fills with old attempts, wrong theories, stale file reads, and outdated decisions. Even if Claude still remembers everything, much of that memory is now toxic.\n\nSo I use `/clear`\n\n, but with a different workflow. I create a [handoff.md](http://handoff.md/) file, include all the relevant details required, like goal, changed file, decisions made, and so on\n\nHere is how you can do it too.\n\n```\nBefore clearing:\nAsk Claude to write .claude/session-handoff.md\n\nInclude:\n- current goal\n- changed files\n- decisions made\n- failing tests\n- root cause\n- next step\n```\n\nThen I run:\n\n```\n/clear\n```\n\nAnd restart with:\n\n```\nRead .claude/session-handoff.md and continue.\n```\n\nThis gives me the best of both worlds: fresh context without lost progress.\n\nFor mid-task cleanup without a full reset, I use `/compact`\n\ninstead. But I never run it blindly. Before compacting, I tell Claude exactly what to preserve:\n\n```\n/compact Preserve: optimistic locking for user updates, no schema changes this session.\n```\n\nThen:\n\n```\n/compact\n```\n\nThis shapes what the summary will capture. In short, critical decisions gets added to context, while removing the noise.\n\nThe difference (a must-know):\n\n`/compact`\n\n: summarize and continue, for tasks that require a lighter context.`/clear`\n\n: full reset with handoff file, for tasks that require switching or starting fresh.**Bottom line**: Don't treat `/clear`\n\nas a reset button. Treat it as context garbage collection. Use `/compact`\n\nfor garbage collection mid-flight with instructions.\n\n`CLAUDE.md`\n\nminimal\nMost people overload `CLAUDE.md`\n\n. They add architecture notes, deployment steps, PR rules, debugging checklists, style guides, testing philosophy, and random project history.\n\nThat feels organized, but it silently burns context every session.\n\nMy rule:\n\nIf Claude does not need it in 80% of sessions, it does not belong in\n\n`CLAUDE.md`\n\n.\n\nMy `CLAUDE.md`\n\nonly includes:\n\n```\n- package manager\n- test command\n- build command\n- repo layout\n- core architecture constraints\n- forbidden patterns\n- naming conventions\n```\n\nEverything else goes into `skills`\n\nor separate `docs`\n\nfolder.\n\nExample:\n\n```\n.claude/skills/db-migration/SKILL.md\n.claude/skills/pr-review/SKILL.md\n.claude/skills/prod-debugging/SKILL.md\n```\n\nBottom Line: Keep the base context light. loads the long workflow only when it's needed, keeping context quality intact.\n\nMost context waste happens because Claude jumps straight into implementation without a clear plan.\n\nIt reads files speculatively, tries an approach, backtracks, and reads more files. By the time it reaches a working solution, the session is already polluted with failed attempts.\n\nPlan mode separates thinking from doing by restricting access to the write-and-modify tool.\n\nI press `Shift+Tab`\n\ntwice before doing anything non-trivial\n\n```\nShift+Tab → plan mode on\n```\n\nIn plan mode, Claude reads files and reasons through the problem without making any changes. I let it map dependencies and surface unknowns.\n\nThen I press `Ctrl+G`\n\nto open and edit the plan directly before Claude writes a single line of code.\n\nBad prompt (no plan):\n\n```\nAdd Google OAuth to the login system.\n```\n\nBetter workflow:\n\n```\n[plan mode on]\nI want to add Google OAuth. What files need to change?\nWhat is the session flow? Create a plan.\n\n[review plan, edit if needed]\n[plan mode off]\nNow implement from the plan.\n```\n\nBottom line: Claude solving the right problem the first time is always cheaper than rework. Let it plan, verify, and then execute.\n\nSubagents are useful, but only when tightly controlled, as they tend to wander off.\n\nI don't prompt:\n\n```\nInvestigate the repo.\n```\n\nI prompt:\n\n```\nInspect only src/auth and tests/auth.\nReturn max 15 bullets.\nInclude exact files.\nNo implementation.\nNo broad repo scan.\n```\n\nIn general, I use Claude subagents for noisy work, such as log analysis, test failure inspection, dependency search, doc lookup, and blast radius analysis.\n\nThis multi-agent collaboration yields a cleaner workflow than a single-agent approach.\n\n**Bottom Line:** The main Claude session should not carry all the exploration. It should only receive the compressed result from each agent.\n\nI don't ask Claude to \"understand the codebase and explain it to me.\"\n\nThat usually causes broad scanning of wrong files, leading to unnecessary context usage, not to mention agents creating docs in the native repo (recent times)\n\nInstead, I create a `docs/repo_map.md`\n\nwith:\n\n```\n- main entrypoints\n- key modules\n- test commands\n- auth/data/payment flows\n- generated folders to avoid\n- files Claude should read first\n```\n\nand then prompt accordingly. Here is an example of what I mean.\n\nBad prompt:\n\n```\nUnderstand the auth system and fix the bug.\n```\n\nBetter prompt:\n\n```\nFind the login entrypoint. Read only the files needed to explain token validation. Do not scan unrelated directories.\n```\n\nThis saves a lot of tokens because Claude starts from a map instead of wandering through the repo and figuring the repo out.\n\n**Bottom line**: Repo Maps leads to fewer incorrect file reads, reducing context usage.\n\nPulling stuff out of `CLAUDE.md`\n\nonly helps if the place you move it to isn't also loaded every session. That is exactly what skills are for.\n\nA skill is just a folder with a `SKILL.md`\n\n(a short name + description, then instructions, plus any reference files or scripts). The win is how it loads: at session start Claude only sees the name and description of each [Claude skill](https://composio.dev/content/top-claude-skills), roughly 30 to 100 tokens each.\n\nThe full `SKILL.md`\n\nbody loads only when Claude decides the skill is relevant, and any bundled reference files or scripts load only if they're actually needed.\n\nSo I can keep deep workflows around without paying for them upfront:\n\n```\n.claude/skills/db-migration/SKILL.md\n.claude/skills/pr-review/SKILL.md\n.claude/skills/prod-debugging/SKILL.md\n```\n\nEight of these sitting in the project might cost a few hundred tokens at startup instead of dumping thousands of lines into context before any work begins.\n\nThe description is doing all the heavy lifting here, so I write it specific. \"Helps with documents\" never triggers. \"Use when filling PDF forms and extracting table data\" does.\n\nPlugins are the next layer up. A plugin bundles skills, slash commands, subagents, hooks, and MCP servers into one installable unit, and you enable or disable the whole bundle on demand:\n\n```\n/plugin marketplace add <marketplace>\n/plugin add <plugin>\n```\n\nThe point is the same as everything else in this post: keep dormant capability out of the active context. Install the plugin when a project needs it, disable it when it doesn't, and you never carry tooling you aren't using.\n\nBad setup:\n\nEverything stuffed into CLAUDE.md so it's \"always available.\"\n\nBetter setup:\n\nTiny CLAUDE.md. Long workflows as skills with sharp descriptions. Capability sets as plugins you toggle per project.\n\nOne of the top bottlenecks for me with the introduction of MCP was managing 30+ MCP integrations. Use fewer MCP servers, and the work won't be done. Use more, and it will add to the context memory.\n\nComposio solved this for me by offering a universal MCP server that plugs into any MCP-supported agents and lets it connect to 1000+ services and tools, with on-demand tool loading, remote workbench for composing tools, and bash tool for handling edge cases with scripting.\n\nHere is a brief breakdown of token savings with Composio MCP and without it for tasks that use tools such as [Linear](https://composio.dev/toolkits/linear), [GitHub](https://composio.dev/toolkits/github), [Sentry](https://composio.dev/toolkits/sentry/framework/vscode), [Supabase](https://composio.dev/toolkits/supabase), and [Context7](https://composio.dev/toolkits/context7_mcp).\n\n**With Composio MCP**\n\nAs you can see, Composio used fewer tokens than the raw approach. It uses only seven meta tools and gets’ you the best bang for token used.\n\nWhen I need even more reduction on top of that, I use the [Composio CLI](https://composio.dev/cli) directly.\n\nLLMs understand and parse shell commands better than tool schemas, they can be combined with multiple commands (composable), and has less to and from between servers and the LLM. It’s faster.\n\nMy rule: I start with Composio MCP to consolidate integrations, then move to CLI when the task is simple enough for a shell command.\n\nBottom line: Active MCP is added as a context tax that compounds across every chat. Every active tool adds overhead to the schema, description, and result. So use fewer, smarter integrations, not more.\n\nIf I were setting this up for daily development, my stack would look like this (in no particular order):\n\n`CLAUDE.md`\n\n`skills`\n\n`repo-map.md`\n\nfor navigation`cc-test`\n\n/ `cc-log`\n\n/ `cc-diff`\n\nwrappers`handoff.md`\n\nbefore clearing contextThe real mindset shift for you is this:\n\nClaude Code works best when you treat it like a powerful engineer with limited working memory.\n\nDon't dump everything on it. Don't make it read garbage. Don't let old context pile up. Give it the right files, the right errors, the right constraints, and a clean session.\n\nThat is where token savings actually come from. Perform better harness engineering.", "url": "https://wpnews.pro/news/9-ways-to-cut-token-consumption-in-claude-code", "canonical_source": "https://dev.to/developer_harsh/9-ways-to-cut-token-consumption-in-claude-code-3jpd", "published_at": "2026-05-30 11:40:56+00:00", "updated_at": "2026-05-30 12:12:43.362629+00:00", "lang": "en", "topics": ["ai-tools", "large-language-models", "artificial-intelligence", "ai-products", "ai-agents"], "entities": ["Claude Code", "Sonnet", "Opus", "Haiku", "Composio MCP"], "alternates": {"html": "https://wpnews.pro/news/9-ways-to-cut-token-consumption-in-claude-code", "markdown": "https://wpnews.pro/news/9-ways-to-cut-token-consumption-in-claude-code.md", "text": "https://wpnews.pro/news/9-ways-to-cut-token-consumption-in-claude-code.txt", "jsonld": "https://wpnews.pro/news/9-ways-to-cut-token-consumption-in-claude-code.jsonld"}}