The most expensive AI mistake is not when your coding agent gets something wrong.
It is when it gets the same thing wrong again tomorrow. That is the part that starts to wear you down. Not because the model failed once.
That happens.
The frustrating part is when you already corrected it.
You explained the repo pattern.
You told it why that migration broke.
You pointed out the weird CI issue.
You showed it the dependency that already failed.
The agent fixed the task.
The session ended.
Then two days later, a new session suggests the same bad idea like none of it ever happened.
That is the problem I have been thinking about lately. AI coding agents do not just need bigger context windows.
They need scar tissue.
Scar tissue is remembered failure.
It is not generic documentation.
It is not a massive chat transcript.
It is not another bloated AGENTS.md
file that gets stuffed into every prompt whether it is relevant or not.
Scar tissue is the durable memory of what went wrong, why it went wrong, and what should not be repeated.
Do not use this migration pattern in this repo.
It passes locally but breaks staging because of X.
Do not replace this middleware.
It looks redundant, but it protects the admin route.
Do not use this package again.
We tried it and it failed on Vercel because of native dependencies.
The Stripe webhook handler must preserve the raw body.
Normal JSON parsing breaks signature verification.
This test failure usually means the mock user is missing a role.
Do not rewrite the auth flow first.
That kind of knowledge is incredibly valuable.
But most of the time, it disappears.
It lives in someone’s head.
Or buried in Slack.
Or trapped in yesterday’s AI session.
Or hidden somewhere in a pull request comment nobody will ever read again.
A lot of AI coding workflows still treat context like the solution to everything.
Add more files.
Add more instructions.
Add more docs.
Add more examples.
Add more project history.
Eventually the prompt becomes a junk drawer. The agent has more text, but not necessarily more judgment. That is the distinction I care about. Context tells the agent what is nearby. Scar tissue tells the agent what it learned the hard way.
Those are not the same thing.
This is what a lot of AI coding sessions look like:
Session 1:
Agent suggests bad approach.
Developer corrects it.
Agent fixes the issue.
Session ends.
Session 2:
Agent has no memory of the correction.
Agent suggests the same bad approach.
Developer loses trust.
The model did not technically “forget.”
It never had durable memory in the first place. It only had temporary working space. Once the session ended, the lesson vanished.
This is the pattern I want instead:
Session 1:
Agent suggests bad approach.
Developer corrects it.
The lesson gets stored as a durable project memory.
Session 2:
Agent starts a similar task.
The relevant scar gets retrieved.
Agent avoids the old mistake.
That is a different kind of AI coding workflow.
Not just faster.
Not just cheaper.
Not just fewer tokens.
More experienced.
The better coding agents get, the more this matters. When agents only wrote tiny snippets, forgetting was annoying. Now they can touch real architecture.
They can refactor files.
They can generate migrations.
They can write tests.
They can modify production-adjacent code.
That makes repeated mistakes more expensive.
If an AI agent is going to operate inside a real codebase, it needs more than instructions.
It needs a memory of consequences. It needs to remember the things that hurt.
This is one of the use cases I am exploring with Empirical.
Empirical is a memory layer for AI tools.
Instead of stuffing every lesson, decision, preference, and warning into a giant prompt, Empirical lets an agent retrieve the specific memory it needs when it needs it.
For coding agents, that means the memory layer can hold things like:
Project decisions
Repo conventions
Failed approaches
Bug history
CI/CD quirks
Security gotchas
Dependency warnings
“Never do that again” lessons
That is the stuff that usually gets lost between sessions.
And it is also the stuff that makes a developer more useful over time.
Why should an AI coding agent be any different?
I do not think the next leap in coding agents is only going to come from smarter models.
Some of it will come from better memory.
Not memory as a transcript dump.
Not memory as “load the whole repo into context.”
Memory as accumulated judgment.
Memory as operational history.
Memory as scar tissue.
Because the real win is not just an agent that can write code. The real win is an agent that remembers why the last fix failed.
I wrote more about the idea here: