A couple of weeks ago I published a post with a tidy rule in it. When you add capability to an AI coding agent, reach for the lightest option first: a procedure file before a CLI, a CLI before a heavier integration, and only build the heavy machinery once you've proven you'll reuse it. My whole case rested on context cost. The heavy options load a lot of definitions up front and carry them every turn, so starting light keeps the window clean.
I still think the front half is right. But it isn't the rule I'd write now, because a reader took it apart in the comments and handed it back as something better. This post is about that exchange, because the rewrite was sharper than my original, and pretending I arrived at it alone would be both a lie and the less interesting story.
The first comment didn't argue with the rule. It walked straight to the blind spot. The moment a tool touches anything external or stateful, lightest-first reverses on you: a lightweight call that fails silently halfway through is harder to debug than a heavier tool that surfaces the failure cleanly. Pay the complexity up front.
My first instinct was to defend, and I did, a little. I said we were measuring different things, that I'd optimized for context cost while they were optimizing for failure observability, both real, different axes. I held the line by pointing out you can wrap a lightweight call to fail loudly, so the cheap path stays open.
That was true, and it was beside their point, and they didn't let me hide behind it.
They asked one question that did more work than my entire post: what's your actual trigger for paying the complexity up front, the type of state, or the class of error?
Sitting with that is where my own rule changed under me. The honest answer is state type, and the moment I said it out loud, context cost stopped being what the rule was about. What makes a failure expensive isn't the error. It's whether the operation changed something you can't take back before it died. A silent failure in a read is an annoyance. The same failure in the middle of a write is a half-changed world you now have to reconstruct.
So the real trigger was never the tool, and never even the error. It was reversibility. If a partial failure leaves something committed you can't cleanly roll back, that's where the heavier, structured-failure tool earns its weight, no matter how cheap the light version looked on context. Stateless or idempotent work stays light and is allowed to fail loud. Mutating, non-idempotent work gets the clean failure surface up front.
Then they put the landing better than I had managed in two thousand words: most error classes look recoverable until you ask whether anything has been written, and at that point the taxonomy is a distraction and you're just triaging state. I've been quoting that line to myself ever since. It's theirs, not mine, and it's the cleanest thing in this whole piece.
Here's what caught me off guard. The rule I published was "lightest first, on context cost." The rule I kept was "lightest first, until a partial failure can leave state you can't reverse." Identical on the front, completely different engine underneath.
Context cost is recoverable. Connect a tool, measure, disconnect it later, no harm done. An unreversible write that failed silently at step three is not recoverable like that. You find it after the damage. So the two axes I'd called equal at the start were never equal, and the thread had quietly walked me off the weaker one and onto the stronger one without my noticing it happen.
We added one more layer near the end, and it's the part I think about most now. Even past your reversibility threshold, there's a cost that doesn't show up at error time. The action might be technically reversible, but trust often isn't. Roll the database back all you want; the person who watched the agent get something wrong across a boundary they cared about doesn't restore their confidence on the same schedule. That cost lands later, a couple of sprints on, when someone quietly starts hand-checking everything the agent produces. No rollback refunds it. I didn't have that idea when I hit publish. It exists because someone kept pushing after I thought we were done.
I could have folded all of this into the original and moved on. I didn't, because the corrected rule isn't the interesting artifact. The correction is.
A clean rule reads like it arrived fully formed. Mine didn't. It had a real hole, a reader found it in one comment, and four or five exchanges later it was load-bearing in a way it simply wasn't when I shipped it. None of that shows if I quietly swap the conclusion and tidy up after myself. The seams are the useful part. They mark exactly where a confident-sounding rule was soft, and they show the kind of question that applies enough pressure to harden it. Sanding them off would hide the one thing worth seeing.
It also reset what I think publishing is for. I used to treat a post as something I finish, then defend in the comments. This one worked better as something I started and let someone else finish. The comment section wasn't an audience reacting to a conclusion. It was the second half of the draft, written after publication, by a collaborator I didn't know I had and didn't recruit.
Two things, and only one of them is about tools.
The tool one: lightest first is still my default, but the trigger for going heavy is reversibility, not context, and there's a second filter for trust that no rollback covers. That's a better rule than the one I published, and the better half of it isn't mine.
The other one is harder to admit. The most useful idea in my own post arrived after I'd published it, from someone who owed me nothing, in a thread I could easily have treated as noise to defend against. If I'd shipped the rule airtight, nobody would have found the seam, because there wouldn't have been one to grab. The hole was the invitation. So I've stopped trying to publish things that can't be argued with. A clean rule with a visible flaw, put where strangers can reach it, pulls in angles I don't have and corrections I can't reach alone. Ship it before it's airtight. Leave the seam showing. The version that comes back is usually the one worth keeping, and if you're honest, it's usually not entirely yours.
I build WordPress plugins and write about AI tooling and security at https://raplsworks.com/.