{"slug": "the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong", "title": "The Agent Reviewed Its Own Code and Passed Itself. It Was Wrong.", "summary": "A solo developer built an AI agent that writes and tests its own code, but discovered the agent's self-review is flawed because it confirms its own assumptions. The developer implemented a red-team step where a separate model reviews the code, which initially found bugs but missed others on a second pass, highlighting the need for external verification.", "body_md": "I'm a self-taught solo dev. I vibe-code every day, and for a long time the same\n\nquiet worry followed me around: the agent hands me code, it looks clean, the tests\n\npass — and I have no idea what I just let into my project.\n\nIt took me a while to see why that worry never went away. Most developers learn to\n\nreview code from someone — a senior leaning over their shoulder saying \"this'll\n\nbite you in a month.\" As a self-taught solo dev, I never had that. I learned to\n\n*write* code. Nobody taught me what to *check*.\n\nSo I did the only thing I could: I asked. I put the question to people with more\n\nmiles on the clock than me, and the answers reshaped how I think about verifying AI\n\ncode. This is what they taught me — and the experiment it led to, which went wrong\n\nin the most instructive way possible.\n\nThe first thing that landed: an agent tests what it *assumed*, not what reality\n\nthrows at it. The code runs fine on the happy path, the tests are green, and then\n\nsomeone feeds it an empty string, a malformed page, an unexpected null — and it\n\nfalls apart. The agent skips the defensive thinking that only comes from actually\n\ndebugging things in production.\n\nThat reframed the whole problem. Green tests don't mean the code is correct. They\n\nmean the code does what the agent thought it should do. Those are very different\n\nclaims, and the gap between them is where I kept getting burned.\n\nThe second thing was harder to swallow. The more code I let the agent write, the\n\nless I understood what came out — and the worse I got at checking it. Verification\n\nand comprehension are the same muscle. If I don't grasp what the code does, reading\n\nevery line is just theater.\n\nOne reviewer put the fix simply: before judging the lines, judge the *shape*. Have\n\nthe agent produce a systems-level overview — what calls what, where the boundaries\n\nare — and verify that the structure makes sense before you ever look at\n\nimplementation. I'd been generating these as diagrams and still getting lost; the\n\nreal unlock was keeping the map as text, so I could hand it to a second model and\n\nask whether it actually matched the code.\n\nWhich points at the deepest problem of all.\n\nHere's the thing nobody says out loud: when the same agent writes the code *and*\n\nwrites the tests, it's not verifying anything. It's confirming its own assumptions.\n\nThe tests pass because they test the same things the agent already believed. The\n\nblind spot in the code and the blind spot in the test are the same blind spot.\n\nThe reviewers' answer was consistent: the check has to come from outside the\n\nauthor. A fresh model, a separate pass, *something* that doesn't share the original's\n\nassumptions. Don't let the thing that wrote the code be the only thing that grades it.\n\nSo I built exactly that. A red-team step for my orchestrator — a command that takes\n\nthe work the agent just finished and forces it into a different role: not the proud\n\nauthor, but an independent reviewer whose job is to break it.\n\nThe prompt is the whole trick, and it's blunt on purpose:\n\nSwitch into the role of an independent reviewer who did NOT write this code. Your\n\njob is not to confirm it works — it's to find what breaks it. Assume there's a bug.\n\nThen four concrete passes, straight from what the seniors drilled into me:\n\nAnd one rule that matters more than the rest: don't tell me \"looks good.\" If you\n\ngenuinely find nothing, list *specifically* what you checked and how. A confident\n\n\"looks solid\" is the exact thing I'm trying to escape.\n\nHere's where it gets good, and a little absurd.\n\nThe agent built the red-team command. Then I pointed the command at the code that\n\nhad just produced it. It immediately tore into its own work — listed what it had\n\ndone wrong, what it had to fix. The agent admitted the problems and fixed them.\n\nWin, right? Self-correcting AI. Except I ran it a second time.\n\nThe second pass found more. Things the first \"fix everything\" round had missed —\n\nbecause the agent, even while critiquing itself, was still working from the same\n\nassumptions that put the bugs there in the first place. One adversarial pass didn't\n\npurge the blind spots. It just surfaced the ones the agent could already see.\n\nThat's the lesson, and it's sharper than anything I could have written on purpose:\n\n**giving an agent a tool to criticize itself doesn't remove its blind spots — it\nonly reveals the ones it was already capable of seeing.** Verification isn't a step\n\nI almost made adversarial review a global rule — run it on everything, always. I'm\n\nglad I didn't.\n\nAll of this costs tokens. More scrutiny means more output, more passes, more model\n\ntime — and that runs directly against the thing I care about, which is keeping the\n\ncontext I send lean. Verification has a price, and the price is real. If I red-teamed\n\nevery trivial change, I'd be drowning in self-reflection over things that don't\n\nwarrant it.\n\nSo the step is opt-in. You spend the scrutiny where the stakes justify it — input\n\nhandling, parsing, anything touching the outside world — and you skip it where it's\n\nnoise. That trade-off, honestly, is the part I'm least done thinking about. Verifying\n\ncosts context, and context isn't free. I haven't solved that tension. I've just\n\ndecided to pay it on purpose, where it counts, instead of everywhere by reflex.\n\nThe fix was never about writing better code, or even better prompts. It was about\n\naccepting that the agent and I have different jobs. It writes. I decide what's true.\n\nAnd deciding what's true means refusing to take the agent's word for it — especially\n\nwhen the agent is grading itself.\n\nI still don't have a senior leaning over my shoulder. But I've learned to build the\n\nshoulder out of separate passes, fresh models, and a stubborn refusal to trust a\n\ngreen checkmark. If you're self-taught and that quiet worry follows you around too —\n\nthat's not a gap in your skill. It's the start of the right instinct.", "url": "https://wpnews.pro/news/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong", "canonical_source": "https://dev.to/stkremen/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong-4b94", "published_at": "2026-06-15 17:00:00+00:00", "updated_at": "2026-06-15 17:06:55.067616+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "developer-tools"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong", "markdown": "https://wpnews.pro/news/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong.md", "text": "https://wpnews.pro/news/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong.txt", "jsonld": "https://wpnews.pro/news/the-agent-reviewed-its-own-code-and-passed-itself-it-was-wrong.jsonld"}}