cd /news/large-language-models/stop-using-the-model-as-your-memory · home topics large-language-models article
[ARTICLE · art-40426] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

Stop using the model as your memory

A developer using Claude Code found that the model's context window is a lossy working memory, not a reliable record. By moving state out of the model and into the work—using a frozen spec and a verified checklist—the developer achieved more consistent results. The approach treats the model as a worker and the repository as the memory, reducing drift and improving reliability.

read3 min views1 publishedJun 26, 2026

I run Claude Code most of the day. The thing that kept biting me wasn't the model getting dumber. It was the model forgetting what we'd already settled, then confidently redoing it wrong.

You've probably hit it. You write a CLAUDE.md

, you keep notes, you tell it "we decided X." A few prompts later it relitigates X, or quietly breaks something it fixed an hour ago. Bigger context windows didn't fix it for me either. A 1M window just means more room for stale instructions to rot in.

Here's the reframe that actually held: stop treating the model as the place the state lives.

A context window is working memory, not a record. It's lossy, it drifts, and every new turn re-derives the world from whatever's in front of it. If "what's done and what's half-broken" only exists in that window, you're trusting the most forgetful part of the system to remember the most important thing.

So I moved the state out of the model and into the work.

Two pieces did most of it:

A frozen spec the agent re-reads. Not a chat message it might compress away. An actual file that says what we're building and what's already decided. When it starts drifting, the spec is the source of truth, not its memory of the conversation.

A checklist it can only tick after something is verified. [ ]

becomes [x] when a test passes or I've confirmed the change, never because the model "thinks" it's done. The checklist carries the progress. The model just moves it forward one verified step at a time.

The difference is subtle but it's the whole game. Before, the work was a side effect of the conversation. After, the conversation is a side effect of the work. The agent can lose the whole thread and reload from the spec plus the checklist and basically pick up where it left off.

When I actually measured my own sessions, almost none of my tokens were fresh input. The bulk was cache reads and re-reading instructions that hadn't changed. So the "context problem" wasn't that I needed to feed it more. It was that I was making it hold too much at once, and most of what it held was stale.

Shrinking what it has to carry per step did more than any prompt trick. Tighter spec, smaller surface, verify-then-advance.

The same thing breaks production agents, just louder. Local demos pass because the whole run fits in one window. In a real app the agent comes back to a thread it can't fully reload, and it drifts. "It worked on my machine" for agents is usually "it worked in one context."

If you're fighting this, the move isn't a better memory plugin. It's making the work carry the state so the model doesn't have to. I'm still figuring out the edges of this (when a spec gets too big it has its own rot problem). But the core has been stable for months: the model is the worker, the repo is the memory.

── more in #large-language-models 4 stories · sorted by recency
── more on @claude code 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/stop-using-the-model…] indexed:0 read:3min 2026-06-26 ·