# Token Efficiency in Claude Code

> Source: <https://dev.to/kavyarani7/token-efficiency-in-claude-code-2kpi>
> Published: 2026-06-16 04:31:45+00:00

**SectorFlow Engineering Series** · Part 1 of 3 · Parent article

*Notes on where our context budget was actually going, and what we did about it.*

June 2026 · SectorFlow Engineering

In this series[Part 2: The Skills File Pattern]— fixing CLAUDE.md bloat with imports.[Part 3: Picking Models and Tools]— the MCPs we tried, refused, and why.

Claude Code can do a lot. The catch is that all of it runs on context, and you don't get much of that.

When we started SectorFlow we did the obvious thing. Kept a `CLAUDE.md`

at the repo root, and every time something went wrong — wrong model string, a cache TTL that didn't match, a chart that came out looking off — we'd write a rule and stick it on the end. The file kept growing. We didn't really clock it as a problem until it was one.

By about week six it was 400 lines. Every session loaded the whole thing. Frontend rules sitting next to deployment runbooks sitting next to database decisions, none of it sorted. And because we'd added the rules one at a time over weeks, some of them flatly disagreed with each other. Claude would follow the new one, or the old one, or try to split the difference. We got something wrong either way.

I want to be clear this isn't a Claude Code problem. It's on us, and it's fixable. But fixing it meant we had to stop treating `CLAUDE.md`

like a junk drawer.

The thing that actually hurts isn't the per-token price. It's that every token spent loading context is a token you don't get back for the work. Burn 30,000 on setup and you've got far less room to write code than if you'd burned 5,000. You hit the ceiling partway through a file and whatever you were in the middle of is just gone.

Here's the shift, and it's simple once you see it: anything Claude reads at the start of a session is something it can't use later for code. Most projects pile everything into `CLAUDE.md`

on the theory that the model might need it someday. We flipped the question. What does the model need for this task? Load that. Skip the rest.

Two rules came out of it:

Those turned into three actual practices, and each one gets its own article in this series. This one is just the overview — what we measured and why it matters.

Every session loads `CLAUDE.md`

plus whatever it imports. Before, that was the one 400-line file, every time, regardless of the task. After we split it into separate skill files, a UI task pulls in `core.md`

(the constraints) and `design.md`

(the visual stuff) and nothing else. An infra task gets `core.md`

and `infrastructure.md`

. Startup cost dropped about 60%.

We had the Linear MCP hooked up so Claude could read tickets itself. Nice in theory. But one `list_issues`

call runs about 3,500 tokens, and the whole read-it / mark-done / comment loop is around 9,000. So now the engineer just pastes the acceptance criteria. That's maybe 400 tokens. The 8,600 difference doesn't sound like much until you multiply it across 60-plus tickets — that's something like 7 or 8 full context windows handed back to the actual work.

Left alone, Claude reads files to get its bearings, sometimes three or four of them before it writes a line. So we made a rule: only read files the task names. Need to find a function? `grep`

for it, then view just those lines. Don't open a file to soak up "context." If something's actually missing, ask. Saves 2,000–4,000 tokens on a complex task.

Verifying a change by eye means starting the server, waiting, navigating, screenshotting, evaluating — a whole chain of calls. For anything you can't see in a browser, like server logic or data contracts or route handlers, that chain tells you nothing. So we only do the visual check when the change is something a person could actually see in a browser. For syntax we run `node --check`

. One Bash call.

| Source of overhead | Before | After | Saving |
|---|---|---|---|
| Session context load | ~400 lines, every session | 60–120 lines, task-specific | ~60% |
| Ticket ingestion (per ticket) | ~9,000 tokens via MCP | ~400 tokens via paste | ~8,600 tokens |
| File reads per task | 3–5 files speculatively | Named files only | 2,000–4,000 tokens |
| Verification overhead | Dev server + screenshot |
`node --check` only |
4–6 tool calls |

Each of these on its own is fine, nothing dramatic. Put together they change what fits in a session. Stuff that used to take two or three sessions now usually fits in one. That's the whole point.

The other two articles each take one piece of this:

Read this one first for the why. Then either of the others for the how.

Claude Code does its best work when the context is small, accurate, and honest about what's actually known versus what you're hoping for. Vague in, vague out. And a context file that tries to cover everything ends up covering nothing properly.
