# The Agent Reviewed Its Own Code and Passed Itself. It Was Wrong.

> Source: <https://dev.to/stkremen/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong-4b94>
> Published: 2026-06-15 17:00:00+00:00

I'm a self-taught solo dev. I vibe-code every day, and for a long time the same

quiet worry followed me around: the agent hands me code, it looks clean, the tests

pass — and I have no idea what I just let into my project.

It took me a while to see why that worry never went away. Most developers learn to

review code from someone — a senior leaning over their shoulder saying "this'll

bite you in a month." As a self-taught solo dev, I never had that. I learned to

*write* code. Nobody taught me what to *check*.

So I did the only thing I could: I asked. I put the question to people with more

miles on the clock than me, and the answers reshaped how I think about verifying AI

code. This is what they taught me — and the experiment it led to, which went wrong

in the most instructive way possible.

The first thing that landed: an agent tests what it *assumed*, not what reality

throws at it. The code runs fine on the happy path, the tests are green, and then

someone feeds it an empty string, a malformed page, an unexpected null — and it

falls apart. The agent skips the defensive thinking that only comes from actually

debugging things in production.

That reframed the whole problem. Green tests don't mean the code is correct. They

mean the code does what the agent thought it should do. Those are very different

claims, and the gap between them is where I kept getting burned.

The second thing was harder to swallow. The more code I let the agent write, the

less I understood what came out — and the worse I got at checking it. Verification

and comprehension are the same muscle. If I don't grasp what the code does, reading

every line is just theater.

One reviewer put the fix simply: before judging the lines, judge the *shape*. Have

the agent produce a systems-level overview — what calls what, where the boundaries

are — and verify that the structure makes sense before you ever look at

implementation. I'd been generating these as diagrams and still getting lost; the

real unlock was keeping the map as text, so I could hand it to a second model and

ask whether it actually matched the code.

Which points at the deepest problem of all.

Here's the thing nobody says out loud: when the same agent writes the code *and*

writes the tests, it's not verifying anything. It's confirming its own assumptions.

The tests pass because they test the same things the agent already believed. The

blind spot in the code and the blind spot in the test are the same blind spot.

The reviewers' answer was consistent: the check has to come from outside the

author. A fresh model, a separate pass, *something* that doesn't share the original's

assumptions. Don't let the thing that wrote the code be the only thing that grades it.

So I built exactly that. A red-team step for my orchestrator — a command that takes

the work the agent just finished and forces it into a different role: not the proud

author, but an independent reviewer whose job is to break it.

The prompt is the whole trick, and it's blunt on purpose:

Switch into the role of an independent reviewer who did NOT write this code. Your

job is not to confirm it works — it's to find what breaks it. Assume there's a bug.

Then four concrete passes, straight from what the seniors drilled into me:

And one rule that matters more than the rest: don't tell me "looks good." If you

genuinely find nothing, list *specifically* what you checked and how. A confident

"looks solid" is the exact thing I'm trying to escape.

Here's where it gets good, and a little absurd.

The agent built the red-team command. Then I pointed the command at the code that

had just produced it. It immediately tore into its own work — listed what it had

done wrong, what it had to fix. The agent admitted the problems and fixed them.

Win, right? Self-correcting AI. Except I ran it a second time.

The second pass found more. Things the first "fix everything" round had missed —

because the agent, even while critiquing itself, was still working from the same

assumptions that put the bugs there in the first place. One adversarial pass didn't

purge the blind spots. It just surfaced the ones the agent could already see.

That's the lesson, and it's sharper than anything I could have written on purpose:

**giving an agent a tool to criticize itself doesn't remove its blind spots — it
only reveals the ones it was already capable of seeing.** Verification isn't a step

I almost made adversarial review a global rule — run it on everything, always. I'm

glad I didn't.

All of this costs tokens. More scrutiny means more output, more passes, more model

time — and that runs directly against the thing I care about, which is keeping the

context I send lean. Verification has a price, and the price is real. If I red-teamed

every trivial change, I'd be drowning in self-reflection over things that don't

warrant it.

So the step is opt-in. You spend the scrutiny where the stakes justify it — input

handling, parsing, anything touching the outside world — and you skip it where it's

noise. That trade-off, honestly, is the part I'm least done thinking about. Verifying

costs context, and context isn't free. I haven't solved that tension. I've just

decided to pay it on purpose, where it counts, instead of everywhere by reflex.

The fix was never about writing better code, or even better prompts. It was about

accepting that the agent and I have different jobs. It writes. I decide what's true.

And deciding what's true means refusing to take the agent's word for it — especially

when the agent is grading itself.

I still don't have a senior leaning over my shoulder. But I've learned to build the

shoulder out of separate passes, fresh models, and a stubborn refusal to trust a

green checkmark. If you're self-taught and that quiet worry follows you around too —

that's not a gap in your skill. It's the start of the right instinct.
