{"slug": "87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage", "title": "87% of My Context Was Garbage: How I Optimized Claude Code Token Usage", "summary": "A developer found that 87% of Claude Code's context window was wasted on conversation history, with tool I/O like file reads and bash output consuming 80% of tokens. To solve this, they built Throughline, a tool that categorizes conversation data into three layers and stores finished tool I/O in SQLite, reducing context usage by approximately 90% in a 50-turn session.", "body_md": "My weekly quota for the MAX plan melted in three days.\n\nEven though I should have had a 20x quota, by Wednesday, the remaining amount was looking suspicious. I usually just brush that off as \"well, that happens,\" but it suddenly made me curious. What is actually going on inside the context window?\n\nIn my [previous article](https://dev.to/quolu/a-journey-into-token-optimization-for-my-ai-assistant-4e1i), I wrote about token saving for AI secretaries, such as trimming CLAUDE.md or shrinking MCP tool definitions. But this time, it’s not about AI secretaries, but Claude Code itself. It turns out the tool itself was the big eater.\n\nOn April 14th, a tweet in Spanish caught my eye.\n\n\"Most of Claude Code's token wastage is caused by the user side.\"\n\nI understand the point—CLAUDE.md might be bloated, or the prompts might be redundant. But \"most of it is the user side\"—are they saying that based on actual measurements?\n\nSo, I decided to measure it myself.\n\nI analyzed the internal transcript of Claude Code (the JSONL that records sessions).\n\n**188,000 tokens per turn. Of that, 164,000 tokens (87%) were conversation history.**\n\nCLAUDE.md was 12,700 tokens. MCP tool definitions were 3,900 tokens. Even combined, they accounted for only 9% of the total. Cutting those in half would only save less than 5%.\n\nThe real culprit was the bloating of the conversation history. I felt a bit embarrassed for having tried so hard to trim CLAUDE.md.\n\nSo, what is inside the history?\n\nI opened it and was shocked. **About 80% of the history was tool I/O.** File read results, Bash command output, grep results—data that the AI used on the spot, made a decision on, and then finished its role by the time it moved to the next step.\n\nYet, that data sits in the context window forever, eating tokens every single turn.\n\nIn a 50-turn session, the results of a grep from the beginning are still in the context, even though you will never look at them again.\n\nYou might think, \"Why not just use /compact?\" I thought the same thing.\n\nBut the mechanism of /compact is to **have the AI read the entire history and summarize it.**\n\nTo save tokens, you consume a massive amount of tokens. Moreover, nuances are lost in the summarization process. Context like \"why this design was chosen at that time\" can be rounded off and disappear.\n\nIf you continue working after summarization, it gets bloated again, and then you compact again... it's a repetitive cycle. It's not a fundamental solution.\n\nI changed my perspective here.\n\nMemGPT and LangChain's SummaryBufferMemory **summarize from the oldest data.** It's time-based compression. But the problem isn't \"age.\"\n\nThe \"reason for this design\" from 10 turns ago is still valuable today. The grep result from a moment ago is useless, even if it was just one turn ago.\n\n**Instead of time, I should categorize by type.**\n\nWith this idea, I created [Throughline](https://github.com/kitepon-rgb/Throughline).\n\nThroughline breaks down the conversation into three layers and saves them in SQLite.\n\n**L1 (Skeleton)** — One-line summaries of old turns. Generated by a lightweight model. About 10 tokens per turn.\n\n**L2 (Body)** — Conversation body of the last 20 turns. User messages and AI responses are kept as is. No compression, lossless.\n\n**L3 (Detail)** — Tool I/O, system messages. Evicted to SQLite and never kept in context. When needed, the AI fetches them from SQLite itself.\n\nIt’s safe to run /clear. Since the SQLite database doesn't disappear, it inherits the memory of the previous session in a single transaction at the start of the next session. There’s no need to track PIDs or judge by time windows. It works decisively.\n\nIn terms of numbers, it looks like this:\n\n```\nWithout Throughline (50 turns, no /clear):\n  Context ≈ 125,000 tokens (80% is finished tool I/O)\n\nWith Throughline (50 turns → /clear → resume):\n  Context ≈ 13,000 tokens\n  (Last 20 turns of L2 + 30 turns of L1 summary)\n```\n\n**About a 90% reduction.**\n\nIt wasn't in this form from the beginning.\n\nIn the initial design, I tried to make L2 a \"structured extraction of important decisions.\" I imagined extracting only important information from the conversation with tags like `[DECISION] Adopt WebSocket`\n\n, `[CONSTRAINT] Port 8080 cannot be used`\n\n.\n\nIt was beautiful in theory, but I realized something after implementation: **You cannot predict what the AI will need in the future.**\n\nInformation that the classifier deems \"not important\" might be needed 10 turns later. And you wouldn't even notice that it's gone. 80% accuracy means that the remaining 20% becomes invisible.\n\nIn the end, I settled on a method where L2 keeps the full text of the conversation. A subtraction-only design. I just remove the tool I/O from the original Claude Code context. This way, \"quality drop due to Throughline\" is impossible in principle.\n\nInheritance between sessions was also initially file-based, attempting to detect /clear within a 10-second window, but that broke with parallel sessions. Eventually, it settled on a single SQLite `UPDATE`\n\n. Simpler is more robust.\n\nThe L1 summary is generated using Haiku 4.5, but there's a trick to it.\n\nAfter analyzing my past 86 sessions, the **median number of turns was 13.** More than half of the sessions end within 20 turns.\n\nThroughline keeps 20 turns of L2, so **the summarization model never runs in short sessions.** Summarization is only needed from the 21st turn onwards. And even then, it processes it lazily, one turn at a time.\n\nIn other words, the token consumption of the summarization process itself is almost zero. The contradiction of /compact, where you \"consume a massive amount to save,\" simply doesn't happen.\n\nAs a byproduct of development, a multi-session capable token monitor was also created.\n\n```\n▶ Throughline  2ed5039c  ████░░░░░░░░░░░░░░░░  205.1k / 21%  Remaining 794.9k  claude-opus-4-6\n```\n\nSince it reads the API actual values (`message.usage`\n\n) from the transcript's JSONL, it provides accurate values rather than rough estimates like \"character count ÷ 4.\" It also automatically detects 1M context limits.\n\nYou can see in real-time how much each session is consuming when running multiple sessions. It’s subtly convenient.\n\nThe true nature of the problem where the quota melts in three days was the conversation history, which occupied 87% of the context window. Most of that was debris from tool I/O.\n\nOptimizing CLAUDE.md or shortening prompts are measures that affect 9% of the total, so they are better than nothing. But that wasn't the main issue.\n\nPerhaps this kind of problem should be solved by the platform side. But I was struggling right now, so I made it myself. Node.js 22.5+, zero dependencies, MIT. It works if you have a MAX contract.\n\nIf anyone else is struggling with the same problem, feel free to take a look if you feel like it.", "url": "https://wpnews.pro/news/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage", "canonical_source": "https://dev.to/quolu/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage-534k", "published_at": "2026-06-04 01:06:50+00:00", "updated_at": "2026-06-04 01:12:28.956974+00:00", "lang": "en", "topics": ["large-language-models", "ai-tools", "ai-products", "artificial-intelligence", "ai-agents"], "entities": ["Claude Code", "Anthropic", "Claude"], "alternates": {"html": "https://wpnews.pro/news/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage", "markdown": "https://wpnews.pro/news/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage.md", "text": "https://wpnews.pro/news/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage.txt", "jsonld": "https://wpnews.pro/news/87-of-my-context-was-garbage-how-i-optimized-claude-code-token-usage.jsonld"}}