{"slug": "no-agent-grades-its-own-homework", "title": "No Agent Grades Its Own Homework", "summary": "A developer demonstrates that AI code review quality degrades when the reviewer is the same model that wrote the code, due to a documented self-preference bias. The solution involves using a separate reviewer from a different model family in a clean context, requiring receipts for findings, and employing a panel of skeptics to refute critical issues. This architecture prevents self-validation and ensures objective code review.", "body_md": "You ask Claude to review your code. It says \"looks good, clean, well factored\". Of course it does. It wrote that code five minutes ago. You just asked the author to grade his own paper, and he gave himself an A.\n\nHaving an AI review code works. But not by asking the one who just wrote it. Quality doesn't come from a smarter model, it comes from an architecture where no role checks itself.\n\nThis isn't a hunch, it's measured. A model evaluating its own output rates it higher than others' at equal quality: the *self-preference bias*, documented by Panickssery and co-authors in 2024, and it's causal, not correlational. The model recognizes its own style and prefers it.\n\nIn practice that means the naive loop \"write, then review what you just wrote\" is broken by construction. You don't get a review, you get a justification. The agent already decided its code was good the moment it produced it; asking again only confirms.\n\nSo the first rule: the reviewer is never the author. In my config, the review agents run in a **clean context**. They don't see the implementation prompt, they don't know what constraints the author set, they meet the diff like a colleague on Monday morning. And when the author is a known model, the reviewer is from a **different family**, to break style recognition.\n\nOne detail matters as much as the rest: the developer's name never enters the reviewer's prompt. No \"this was written by a senior\", no \"review this model's work\". The author's identity is exactly the information that triggers the bias. We take it off the table.\n\nThe second trap is the opposite of the first. An AI reviewer, especially in a clean context, tends to over-flag: it invents problems to look useful, it flags \"vulnerabilities\" that aren't. A review that cries wolf on every line is no better than a complacent one: either way, you stop listening.\n\nHence the receipt rule. Every finding must cite a `file:line`\n\n*and* pass a check before it's surfaced: a grep proving the occurrence, a sandbox run, a failing test, a data-flow trace. A finding nobody can prove is dropped silently, no debate.\n\n```\nFinding: \"non-parameterized SQL call, injection risk\"\n  → receipt required: grep the user-input → query flow\n  → if the value is a code constant: dropped\n  → if it comes from the HTTP request: kept, with the line\n```\n\nProof comes before judgment. The reviewer isn't allowed to bother you over a hunch.\n\nFor critical findings, the ones that would block a merge, I add a last layer: a panel of independent skeptics whose instruction isn't to confirm but to **refute**. Each one gets the finding and tries to tear it down: \"here's why this isn't a bug\". If a majority is needed to *keep* the finding, plausible false alarms don't survive. The ones that remain took a demolition attempt and held.\n\nIt's the exact opposite of the naive loop. Instead of one model trying to be right, several trying to contradict each other. The truth that comes out has been attacked, not self-proclaimed.\n\nPut end to end, this gives a team of agents where the roles never overlap. The one who writes the code isn't the one who writes the tests, which are written from the spec only, not from the code. The one who reviews didn't write. And before a human or an LLM even gives an opinion, an objective gate (build, lint, tests) has to be green: the model's judgment only comes after the machine, never instead of it.\n\nThis isn't gratuitous distrust of AI. It's the same principle that governs a newsroom, an accounting team, a court: you let no one sign off on their own work, because no one is a good judge of themselves. LLMs, with a proven self-preference bias, even less than the rest.\n\nThe temptation, with a model that codes well, is to hand it the whole cycle: write, test, review, sign off. That's exactly what you mustn't do, because each of those steps corrects the previous one, and a corrector that corrects itself corrects nothing. The quality of an AI review isn't measured by the model's intelligence. It's measured by how many times you stop it from grading itself.", "url": "https://wpnews.pro/news/no-agent-grades-its-own-homework", "canonical_source": "https://dev.to/ohugonnot/no-agent-grades-its-own-homework-8lb", "published_at": "2026-06-28 12:38:30+00:00", "updated_at": "2026-06-28 13:04:27.716127+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "ai-research", "developer-tools"], "entities": ["Claude", "Panickssery"], "alternates": {"html": "https://wpnews.pro/news/no-agent-grades-its-own-homework", "markdown": "https://wpnews.pro/news/no-agent-grades-its-own-homework.md", "text": "https://wpnews.pro/news/no-agent-grades-its-own-homework.txt", "jsonld": "https://wpnews.pro/news/no-agent-grades-its-own-homework.jsonld"}}