Your teammate used Claude to generate a new API endpoint. The code looks great β clean formatting, proper error handling, even comments. You skim through it, see it follows conventions, CI is green. You approve.
Two weeks later, the endpoint silently drops a decimal place on currency conversions. A financial report is wrong for three days before anyone notices.
This scenario is playing out in hundreds of teams right now. Not because AI generates "bad code" β but because AI-generated code fails in ways human code doesn't, and your existing review process wasn't designed for it.
AI doesn't flag uncertainty. It presents everything with equal confidence. A human developer might write // not sure about the caching here
β that nervous comment tells you exactly where to look. AI never writes that comment. It writes // Transform the input to match the expected schema
with full confidence, even when the transformation is wrong.
After reviewing hundreds of AI-generated PRs over the past year, I found a pattern. The bugs aren't in formatting. They're in the places a quick glance won't reach:
Generic "review this code" prompts won't catch these. You need a system.
I built a review protocol specifically for AI-generated code. Four rounds, each targeting a different failure mode. Total time: ~15 minutes for a typical PR, up to 35 minutes for a large one.
| Round | Name | What You're Catching |
|---|---|---|
| 1 | Surface Scan | Logic errors, off-by-one, wrong assumptions |
| 2 | Security Deep-Dive | Injection, auth gaps, data leaks |
| 3 | Architecture Smell Check | Wrong patterns, tech debt, doesn't fit the system |
| 4 | Comparison Pass | Does this match what we actually asked for? |
The key insight: each round uses a separate AI prompt that forces a different lens on the same code. You're not asking the AI to "review this code" four times β you're asking four different, targeted questions.
Let me show you the two rounds that catch the most issues.
This is the high-probability round. Most AI bugs live here β logic errors, wrong assumptions, off-by-one bugs. The code looks correct. It's subtly wrong in exactly the ways a quick glance won't catch.
Here's the prompt I use:
Review this code for logic errors only. Do NOT suggest style improvements,
documentation, or refactoring. I want you to find:
1. Off-by-one errors, wrong comparisons, or inverted logic
2. Wrong default values or assumptions about data shape
3. Missing edge case handling (null, empty, zero, max values)
4. Race conditions or non-atomic operations on shared state
For each issue found, state:
- The exact line number
- Why it's wrong
- What the correct behavior should be
If you find zero issues, explain why each edge case IS handled,
not just say "looks good."
[Paste the PR description or requirements if available]
The critical instruction is that last line: "If you find zero issues, explain why each edge case IS handled." Without this, the AI will happily say "looks good" and move on. Forcing it to justify the all-clear catches things a simple yes/no never will.
The trap: AI-generated tests will pass. AI knows what the code does, so it writes tests that confirm the code's behavior β including its bugs. Perfect test coverage means nothing if the tests are testing the wrong thing.
This is the scary one. AI models are trained on massive amounts of public code, including code with security vulnerabilities. They don't understand security β they understand patterns. If the most common Stack Overflow solution uses eval()
or concatenates SQL strings, the AI will reproduce that pattern with full confidence.
The most common AI security failures: SQL injection, insecure deserialization (pickle, Marshal, YAML.load
), BOLA/IDOR (authenticated but accessing someone else's resource), mass assignment, and SSRF.
Here's the prompt that catches what your brain won't think to check:
You are a malicious actor with valid API credentials who want to exploit
this code. Walk through every possible thing you could try:
- Can you access data you shouldn't be able to?
- Can you escalate privileges?
- Can you cause the system to leak internal information?
- Can you trigger unexpected behavior with edge inputs?
- Can you cause the system to consume excessive resources?
Think step by step. List at least 5 distinct attack vectors. If you can't
find 5, you're not thinking creatively enough.
After listing individual vectors, describe at least 2 attack chains where
you combine multiple steps to achieve something none of the individual
vectors accomplish alone.
Pro tip: use a different AI model for this round than the one that generated the code. If Claude wrote the code, use GPT-4 to review it. Different training data means different blind spots. This single change catches vulnerabilities that using the same model consistently misses.
The trap: Auth checks that look right but aren't. AI will write if current_user.present?
β the user is authenticated, but the code doesn't check if they're authorized for that specific resource. The check looks secure but isn't.
1. Clear context between rounds. Don't run all 4 rounds in the same chat thread. Start a fresh conversation for each round. If you run Round 2 in the same context as Round 1, the AI already "knows" what it told you in Round 1 and will unconsciously align its analysis. Fresh context forces independent analysis. Costs 30 seconds, worth every one of them.
2. Run all reviews first, fix once. The naive approach is to fix issues one at a time β fix the logic bug, review, fix the security hole, review. This creates whack-a-mole: fixing the architecture can introduce a new security hole. Instead: run all 4 rounds, collect every issue, send the complete list to the AI in one shot, then re-run all rounds on the result.
Not every PR needs all 4 rounds:
| Scenario | Rounds to Run |
|---|---|
| Comment or docs change | None |
| Variable rename | None |
| Typo fix | 1 & 4 |
| New API endpoint | 1, 2 & 4 (skip 3 if it follows existing patterns) |
| New feature, new patterns | All 4 |
| Auth or payment change | |
| All 4 β extra time on round 2 | |
| AI-generated bugfix | |
| All 4 β the fix might work but introduce new bugs |
General principle: if AI generated the code, lean toward running more rounds. That's the whole point.
Round 4 β the Comparison Pass β is the most commonly skipped and the most dangerous to skip. It asks one question: does this code actually solve the problem we asked for?
AI is excellent at solving the problem you typed, not the problem you meant. It takes your words literally. The most common failure: AI solves the first 80% of a ticket perfectly and quietly ignores the last 20% because it "didn't seem important." The code is perfect β for the wrong thing.
If you have a ticket or issue, paste it in and make the AI verify each acceptance criterion against the code. You'll be surprised what's missing.
Here's how to adopt this without overwhelming yourself:
The best review process is the one that evolves. When you find a pattern this protocol doesn't catch, add your own round. When a prompt stops finding bugs, retire it.
I've been using versions of this protocol for over a year. It has saved me from shipping bugs that I would have approved on a first pass. Not every time, but often enough that running it is automatic now.
I wrote the full protocol β all 4 rounds, 12 copy-paste prompts, the "traps to watch for" in each round, a printable checklist, and the review loop workflow β into a guide. It's called The AI Code Review Protocol and it's on Gumroad for $19 (launch price of $12).
If you want the complete version with Rounds 3 and 4, the architecture smell checklist, the PII audit prompt, and the automation approaches β that's there. If this post was useful, the guide goes deeper.
If you've built your own review process for AI code, I'd genuinely like to hear what works for you. I'm @raithlin on X, or drop a comment below.