I stopped trusting Claude's code reviews, so I built a skill that puts my code on trial

wpnews.pro

cd /news/ai-agents/i-stopped-trusting-claude-s-code-rev… · home › topics › ai-agents › article

[ARTICLE · art-25973] src=dev.to ↗ pub=2026-06-13T07:32Z topic=ai-agents verified=true sentiment=↑ positive

I stopped trusting Claude's code reviews, so I built a skill that puts my code on trial

A developer built Tribunal, an adversarial Claude skill that reviews code diffs by pitting biased agents against each other to produce more honest feedback. The system uses per-file 'haters' that attack the diff on technical merits and a cross-module bug hunter, with a judge that validates each accusation against actual code intent. The skill is open-source, portable across Claude Code and Claude Cowork, and supports multiple programming languages.

read2 min views20 publishedJun 13, 2026

Every time I asked Claude to review my branch, I got one of two answers: a cheerful "Looks good! 👍" or a vague list where I couldn't tell a real bug from a matter of taste. The model wants to please you. That's exactly the problem.

So I built Tribunal — a Claude skill that reviews your diff adversarially, in stages, where the honest signal comes from agents fighting each other instead of one polite model.

A single model told to "be critical" still hedges — it's trained to be agreeable. So instead of one balanced reviewer, Tribunal runs one-sided roles that collide:

One agent per file, deliberately biased. It tears the diff apart as if a clueless amateur wrote it — focused only on what changed. But strictly on the merits: correctness, races, leaks, edge cases, security. No style nitpicks.

Per-file haters are blind to cross-module bugs. A separate agent hunts exactly those: a changed function signature whose caller still calls the old way, a return shape a consumer no longer matches, invariants out of sync across files.

For each accusation, the judge digs into the actual code and decides honestly: was this deliberate and justified, or genuinely weak? It's allowed to use docs and comments as evidence of intent — the opposite of the hater, who ignores them as excuses. Keeps only the spots the judge couldn't defend — or conceded are weak even while defending the choice. Everything else drops to a full transcript.

The balance doesn't live inside any single agent — it comes from the clash between them. A hater that can only attack, meeting a judge that only looks for justification, produces a sharper, more honest signal than one model trying to be "balanced" on its own.

And the hater is allowed to return nothing. On a clean diff it's not forced to invent problems — empty is a valid, honest result.

A ranked report written to docs/reviews/

, plus a short chat summary: what to actually fix, by severity (critical → major → minor), with a concrete fix for each.

It's portable — pure Claude sub-agents (the Agent

tool), no external runtime, no dependencies. Works in Claude Code and Claude Cowork, in any language (Python, JS/TS, Go, Rust, Java… one config line to add yours).

It's MIT and free: https://github.com/hekman316/claude-skill-tribunal Install is one paste — ask Claude to fetch the SKILL.md

from the repo and drop it in ~/.claude/skills/ . Then in any repo just say /tribunal

I'm genuinely curious what people think of the adversarial-roles approach. Does forcing the model into one-sided roles actually beat just asking it to be harsh? Would love feedback — or attempts to break it.

source & further reading

dev.to — original article Auto-Generating an Index of Your Claude Code Custom Agents from Their Frontmatter Operable Over Sophisticated: What Shipping AI Agents at Scale Actually Looks Like Why Replit's AI Agent Deleted a Production Database

~/api · this article 200

$curl api.wpnews.pro/v1/news/i-stopped-trusting-claud…

Read original on dev.to → dev.to/hekman316/i-stopped-trusting-claudes-code…

mentioned entities

Claude

Tribunal

GitHub

Claude Code

Claude Cowork

metadata

slugi-stopped-trusting-claude-s-code-reviews-so-i-built-a-skill-that-puts-my-code-on

topic#ai-agents

secondary3 topics

sentimentpositive

canonicaldev.to

navigation

← prevDrinkner – Matches your bar to t…

next →Anthropic disables top-tier AI m…

── more in #ai-agents 4 stories · sorted by recency

getreadyforagents.com · 28 Jul · #ai-agents

SessionGrep indexes CLI agent session histories into searchable SQLite database

dev.to · 28 Jul · #ai-agents

I Added an MCP Server to NPMScan for AI Coding Agents

getreadyforagents.com · 28 Jul · #ai-agents

Cursor IDE posts surge 2.8x week-over-week amid integrations and workarounds

getreadyforagents.com · 28 Jul · #ai-agents

Developers build Codex integrations for MIDI composition, gaming, and cross-agent communication

── more on @claude 3 stories trending now

wpnews · 16 Jul · #artificial-intelligence

Women entrepreneurs are less likely to leverage AI—but more likely to benefit from it

wpnews · 26 Jul · #ai-safety

University of Washington study reveals prompt injection risks lurking in AI agent memory

wpnews · 28 Jul · #artificial-intelligence

How Claude Code and VS Code turned Anthropic from a safety lab into a developer phenomenon

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required