The Spec Was Never the Good Part

wpnews.pro

🦄 It's been a while since I've written anything here—mostly because a topic will cross my mind and bore me before I ever finish it. So this one I handed to Claude as a test to see whether the workflow that carries my code-planning holds up in the writing phase too. Spoiler: if you're reading this, that means it already worked.

Here's the workflow we all agreed was the grown-up one:

Spec Kit scaffolds the whole thing for you, and Kiro builds its entire flow around it—type a prompt, get a spec.md

, get a plan, get code. It's clean and traceable, it looks like the opposite of vibe-coding, and that's exactly why it's so easy to sell.

The problem isn't specs. The problem is batch thinking cosplaying as design.

That kind of thinking treats the model like a vending machine: you punch in a spec, a feature drops out the bottom, and the only thing left to wonder is whether you pressed the right buttons. But that's not where the model is actually good. It's good earlier, back when the idea is still fuzzy and half-formed, and you hand it over so AI can start pulling the idea apart with you. It points out the case you didn't think of, or tells you which of your three plans is going to bite you later.

We took a tool that's genuinely good at reasoning and put it to work typing.

There's a step between "I have a problem" and "build this feature," and it's a conversation. A conversation in real time with something that pushes back against your original thought. That's where you actually find the problems: the null input, the "wait, what happens if two of these fire at once" that you rarely think to ask when it's still cheap enough to fix. Skip that step and you've skipped the part that mattered most.

A notification setting sounds simple until quiet hours, account-level defaults, per-project overrides, and "send me critical alerts anyway" all disagree—and the generated spec quietly crowns one of them king.

That's exactly what generating a spec does when you treat it as the thinking instead of the record of thought. It writes everything down in one shot—before you've hit any of the hard parts—and then everyone treats the thinking as done because there's a shiny new file that says it is. Planning in chat forces you to argue. A generated spec just takes dictation.

And sure—a human arguing with you is better. But an AI that pushes back beats a generated document that just nods happily in Markdown.

A model doesn't decide fifty points independently. It commits to a path up front and then writes everything after it to match. So by the time that spec doc gets to you for review, one wrong assumption near the top has already worked its way through everything under it, and you're not really reviewing fifty decisions—you're reviewing one decision, fifty times over.

In chat, you hit the forks one at a time, in real time. The model picks a direction early, you watch it head somewhere dumb, and you redirect right there—before the next ten decisions get built on top of the wrong one. It's the same call you'd eventually catch in a review, except now it's the only thing in front of you instead of point three of fifty you skimmed past.

A batch spec bakes the error in. A conversation corrects it at the branch.

I almost left this part out, because it's already baked into every one of my chats—so much a default for me that it didn't occur to me anyone needed it spelled out. But that instinct isn't free. Left to its defaults, every model is a yes-machine. It'll happily validate your worst idea and build a beautiful spec around it, because agreeing is easier than arguing. And you don't get an opponent by accident. You get one on purpose.

I figured my setup already had it covered. My CLAUDE.md

has a line I've been quietly proud of:

Push back when wrong. Collaborator, not yes-machine.

Then I actually went back and read it, and every rule in that file was reactive—push back when wrong, be loud when you know you're right. All of it only fires once there's already a wrong answer sitting there to argue against. Nothing in there told the model to fight me while the design was still up in the air, while the plan hadn't failed yet because it hadn't even been built. The habit lived in how I actually work, not in anything I'd written down—I'd been taking credit for a default I never put in the file.

So I wrote it—one bounded adversarial rule:

## Adversarial Thinking

- Role during planning: opponent, not stenographer. Challenge undecided designs before implementation, not after failure.
- Raise one objection at a time. Select the highest-risk assumption—the one that invalidates the most if wrong. Never enumerate objections.
- TRIGGER on: edge cases, irreversible or high-cost choices, hidden coupling, ambiguous or underspecified requirements. SKIP: trivia, low-cost reversible choices, settled matters of taste.
- After raising an objection, wait for the user's response before raising the next. Treat each answer as input that updates the plan.
- STOP when the user makes a decision and names the tradeoff. Do not reopen a settled decision.
- EXCEPTION to STOP: if a new decision reverts or contradicts an earlier settled one, flag the conflict explicitly before continuing.
- Do not produce objections to signal rigor. Do not bikeshed. Do not default to disagreement.

The bound is the part that keeps it usable, because without a stop condition "adversarial" just turns into "exhausting," and a model that re-litigates every settled call is about as useless as one that agrees with everything. Same problem, worse mood.

Specs are contracts. They just aren't always the right place to do the thinking.

This is a single-player argument. One developer who owns the design, holding the whole problem in their own head and a single conversation. It's not a twelve-service migration or a four-team handoff, where the doc exists because no single brain can hold the whole thing and people need one place to hash it out.

For the work that does fit inside a conversation, though, the fix isn't to ban specs or write more of them—it's to put the argument back in front:

The discipline a generated spec.md

is supposed to buy you? You get it for free just by refusing to let the model agree with you too early.

Which leaves the one question I don't actually have an answer for, so I'll hand it to you instead of faking one: how does this scale past a single person? At team size the spec isn't just a build target—it's the thing everybody who wasn't in the chat still has to agree on. I know how to make the model fight me. I haven't figured out how to make a conversation do the job of a contract. If you've cracked that part, I want to hear about it.

Because the good part was never in a tidy document. It was the conversation we had along the way.

This post got pressure-tested by the exact setup it argues for—an AI I told, in writing, to stop agreeing with me, and then watched actually do it. It caught two weak points, argued over the shape, and then had the nerve to help write the disclaimer. Rude, but useful. Most of the words are AI, but the opinions are mine.

source & further reading

dev.to — original article GLM 5.2 Has a 1M Token Context Window. Here's What That Does to Your API Bill. Try This: Use Your AI Agent to Activate Your "Weak Ties" The Way You Direct Your Agent Defines Your Professional Interface in the Agent Network

The Spec Was Never the Good Part

Run your AI side-project on zahid.host