{"slug": "the-harness-has-a-token-budget", "title": "The Harness Has a Token Budget", "summary": "A developer's project CLAUDE.md crossed 4,000 tokens last quarter, causing an AI agent to miss rules buried in the middle of the file that it had previously respected for months. The engineer concluded that the harness has a token budget that had been exceeded, with each line costing context that the agent could otherwise use for task-specific work. The developer identified four underused strategies—consolidation, compression, scoping down, and removal—to manage the budget and keep load-bearing rules at the top of the agent's attention.", "body_md": "Our project CLAUDE.md crossed 4,000 tokens last quarter, and the agent started missing rules it had been respecting for months. Not the rules at the top. The rules in the middle. The ones buried under three other sections of guidance, the ones the agent could reach if it stayed focused but did not reach reliably when it was deep in a task.\n\nThe story would have ended with \"make the file shorter\" except for the part where most of those rules had earned their place. Each one had an incident behind it. Each one had a reviewer who would defend it. The harness was not bloated; the harness was honest. And the agent was still missing rules.\n\nThe conclusion that took me too long to reach: the harness has a token budget, and we had blown it.\n\nEvery CLAUDE.md line costs context. The agent reads the harness on every session, in every window, before any of your task-specific context arrives. A 4,000-token CLAUDE.md is 4,000 tokens the agent is not using for the file you actually asked it to edit.\n\nThe first cost is straightforward: less window for the work. On a small task it does not matter. On a task that spans three files and a long log, it absolutely matters; the agent runs out of room and starts dropping context, which is exactly when you want every rule to be at the top of attention.\n\nThe second cost is more subtle. The agent's attention is not uniform across the input. Rules at the top of the harness fire more reliably than rules buried halfway down. The middle of the file is the worst place for a load-bearing rule, and it is also where rules accumulate by default, because every new rule gets appended to the section where it logically belongs and slowly pushes the older rules further into the middle.\n\nThe third cost is human. The team reads the harness when onboarding, when investigating an agent failure, when proposing a change to a rule. A long file is a file nobody reads top to bottom. The institutional memory of why each rule exists evaporates, and the next maintainer is reading three years of accumulated decisions with no map.\n\nThe way I think about it: each rule trades tokens for prevented mistakes. The exchange rate is the rule's worth.\n\nA rule that prevents one production incident per quarter, and costs 30 tokens to encode, is overwhelmingly worth keeping. A rule that fires on a pattern the agent never produces, and costs 150 tokens because it explains the reasoning in three paragraphs, is a bad trade.\n\nThe hard part is that most rules look fine in isolation. The trade only becomes visible at the budget level. You cannot evaluate a single rule and decide whether to keep it; you have to evaluate it against the total cost of the file it lives in and ask whether you would trade it for one or two rules at the top of attention that you currently do not have.\n\nThe discipline is to think in budget terms rather than line terms. Not \"is this rule useful.\" But \"would I rather have this rule, or the 200 tokens of attention it costs me on every session.\"\n\nOnce you accept the budget, the harness has four moves available, and they are all underused.\n\n**Consolidate.** Two rules that say almost the same thing become one rule, half the length, applied where both used to apply. The consolidation usually surfaces redundancy nobody had noticed. The agent reads one rule instead of two and applies it more consistently.\n\n**Compress.** A rule that explains the reasoning in three paragraphs becomes a rule that states the rule in two lines. The reasoning moves to a comment in the codebase, or to the PR that introduced the rule, or to a feature doc loaded only when needed. The agent does not need the reasoning to apply the rule; the agent needs the rule.\n\n**Scope down.** A rule at the project root that only applies to one module moves to that module's CLAUDE.md. The token cost is paid when the agent is touching that module, and zero when the agent is anywhere else. The scoping piece covers this; the token-budget framing is what makes the move feel urgent rather than tidy.\n\n**Delete.** A rule whose origin nobody remembers, whose pattern the agent does not produce, and whose removal does not change behavior, is gone. The rule-lifecycle piece covers the discipline. The token budget is what makes the deletion mandatory rather than optional.\n\nThe four moves are not refactors. They are accounting.\n\nThe harness is not the only place a rule can live. The codebase has tools for encoding constraints that pay no token cost on every session.\n\nA rule about how to format dates can be a lint check. A rule about what file to import from can be enforced by `eslint-plugin-import`\n\nor its equivalent. A rule about which directory a new component goes in can be enforced by a generator command. None of these rules need to live in CLAUDE.md, because the constraint is already encoded somewhere the agent will respect.\n\nThe check I run when adding a rule: is there a place in the codebase where this rule could be enforced mechanically. If yes, the rule goes there. The harness only carries rules that have to be linguistic, the ones that depend on judgment or context the linter cannot see.\n\nThe harness pays its token cost in attention. Lint pays its cost in CI minutes. Lint is cheaper on a per-rule basis, and the cost does not compound against every other rule.\n\nOnce a quarter, I sit with the harness open and a question in mind: where am I overpaying.\n\nThe rules that bleed the most tokens are the ones with the most explanation. Five lines of context, two lines of rule, three lines of edge cases. The agent does not need the context; the team did, once, when the rule was being negotiated. The context is dead weight now. Compress it.\n\nThe second-worst offenders are the rules that should be scoped. Half the rules in a typical project root CLAUDE.md apply to a single module. Each one pays its token cost on every session, including the ones where the agent is nowhere near that module. Scope them down. The agent's attention on the API task is not paying for the frontend rule.\n\nThe third batch is the simplest: rules whose origin nobody remembers and whose pattern the agent does not produce. They go.\n\nEach quarter the audit returns somewhere between 800 and 1,500 tokens to the budget. The agent gets better. The team can read the file again.\n\nImagine you had to fit the entire harness on a single screen of your editor. No scrolling. What rules would survive.\n\nThe exercise sounds artificial. It is not. The rules that would survive are the rules that are doing the most work. The rules that would get cut are the rules that look useful but are mostly paying for themselves through inertia.\n\nYou will not actually shrink the harness to one screen. The point of the exercise is to find out which rules pass the test and which ones are surviving because nobody has asked them to justify themselves.\n\nThe budgeted harness is not the smallest harness. It is the one where every rule has earned its tokens.\n\nKeystone is the open source tool I built for harness engineering, and the budget framing is wired into how it loads context. The design starts from one decision: context is a scarce resource, and the harness has to declare what gets to live in it at each moment.\n\nKeystone splits the harness into three tiers, each with its own budget.\n\n**Always-on guides.** Short rule files that load on every session. The whole guide layer on a fresh 0.7.0 install runs about 28K tokens across 53 files. That is the ambient cost: the rules the agent has to see every time it picks up work.\n\n**On-demand corpus.** The reasoning behind a rule, the long examples, the historical context: none of that lives in the always-on layer. It sits in corpus files the agent loads only when a guide forward-links into one. About 1-3K tokens per file. Most tasks never touch them.\n\n**Transient sensors.** Lint output, test output, audit output. These appear in context only during verification and then drop out. About 1-5K per verify cycle, carrying signal without leaving residue.\n\nThe math works the way the budget framing predicts. A fresh install costs roughly 14% of a 200K window, or 3% of a 1M window. A worst-case full audit, pulling in every corpus file and every sensor at once, lands around 45-55K tokens. The agent still has most of its context free for the work itself.\n\nThe tiering forces the same question the four moves ask: *what tier does this rule belong in*? A rule that fires on every task earns its always-on slot. A rule that explains a decision once a quarter belongs in corpus, behind a link. A check that runs only during verify lives in sensors and leaves no residue.\n\nKeystone lives at [www.tacoda.dev/keystone](https://www.tacoda.dev/keystone/).\n\nThe principle is the one this whole post argues for: every token in the always-on layer is paid every session, so the always-on layer has to be small, deliberate, and audited. The tool is the discipline made mechanical.\n\nOpen your CLAUDE.md. Read it top to bottom, the way the agent reads it, on the assumption that attention falls off the further down you get.\n\nFind the rule with the worst exchange rate. Probably it explains itself in five lines, lives in the project root when it should live in a module, and prevents a pattern the agent has not produced in six months. Compress it, move it, or delete it.\n\nDo that for ten rules. Measure the token savings. The savings will surprise you. The agent's behavior will not get worse. In most cases, it will get better, because the rules that survived will fire more reliably than they did when buried.\n\nThe harness is not free. The token budget is real. The discipline is to spend it on what matters.", "url": "https://wpnews.pro/news/the-harness-has-a-token-budget", "canonical_source": "https://dev.to/tacoda/the-harness-has-a-token-budget-gcn", "published_at": "2026-06-03 17:39:11+00:00", "updated_at": "2026-06-03 17:41:53.309828+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-tools", "ai-infrastructure", "ai-safety"], "entities": ["CLAUDE.md"], "alternates": {"html": "https://wpnews.pro/news/the-harness-has-a-token-budget", "markdown": "https://wpnews.pro/news/the-harness-has-a-token-budget.md", "text": "https://wpnews.pro/news/the-harness-has-a-token-budget.txt", "jsonld": "https://wpnews.pro/news/the-harness-has-a-token-budget.jsonld"}}