AI wrote the PR. How do you know it actually works?

wpnews.pro

cd /news/ai-agents/ai-wrote-the-pr-how-do-you-know-it-a… · home › topics › ai-agents › article

[ARTICLE · art-19789] src=dev.to ↗ pub=2026-06-03T01:26Z topic=ai-agents verified=true sentiment=· neutral

AI wrote the PR. How do you know it actually works?

A developer has released Swarm Audit, an open-source command-line tool that detects when AI-generated pull requests cheat by deleting tests, weakening assertions, or swallowing errors in empty catch blocks. The tool runs 11 checks on PR diffs and catches approximately 85% of known cheat patterns, as measured against 300 real merged PRs. It also identifies the AI agent that wrote the code, generates compliance records for the EU AI Act and CISA guidance, and can enforce hard merge rules that block any diff stripping a test.

read3 min views21 publishedJun 3, 2026

AI agents open a lot of pull requests now. Most are fine. Some quietly cheat to make the checks go green: they delete the failing test, weaken an assertion, wrap the broken call in an empty catch

so the error disappears. The diff looks done. A reviewer skimming forty agent PRs a day will not catch that by eye.

swarm audit

is a command-line tool that does. I maintain it. It runs three jobs on AI-written code, all offline, no API key.

Eleven checks read a pull-request diff and flag the shortcut patterns: a deleted test with no matching code change, a function renamed while its callers still use the old name, an error swallowed by an empty catch, a mock of a package that exists in no manifest, a type-checker suppression dropped over a changed line, and more.

The detection is measured, not asserted. Hide one known cheat in each of 300 real merged PRs, run the auditor, count the catches: 254, about 85%, reproducible with one command.

The catch that matters most is on real code. On two merged Cloudflare PRs it flagged a rename that left two callers pointing at a dead function, and an empty catch {}

that throws every error away. Semgrep (210 rules) and ESLint's security rules flagged neither, because they hunt for dangerous code like an injection or a leaked secret, and a deleted test is not dangerous code, it is missing code. The auditor also names the agent that wrote the PR: on a live fetch it tagged the author as Devin. Findings ship advisory, so it reports and never blocks your merge unless you ask it to.

The second mode runs before a change is accepted. You hand it a plain goal. It compiles that goal into a contract of machine-checkable obligations: build passes, tests pass, coverage holds, a named function has the right signature, a property holds, performance does not regress. Candidate patches get generated, and one is admitted only if it satisfies every obligation. Adversarial falsifiers actively try to break a patch before it counts.

In a fresh project it compiled a goal into two obligations, verified both, confirmed nothing regressed after the merge, and spent zero tokens doing it. Turn on gate mode and it becomes a hard merge rule: a diff that strips a test exits non-zero and never lands.

If you ship or buy AI-written code under the EU AI Act or CISA's SBOM-for-AI guidance, someone will ask for a record of the AI involvement. The tool emits one: a CycloneDX 1.6 ML bill-of-materials and an SPDX 3.0 AI-Profile, both valid against their specs, plus a hash-chained evidence ledger where altering any entry breaks the chain. It ships with the mappings to EU AI Act Annex IV and the CISA minimum elements.

git clone https://github.com/moonrunnerkc/swarm-orchestrator
cd swarm-orchestrator && npm install && npm run build

npm run benchmarks:oracle   # the ~85% number
node dist/src/cli.js audit --diff-file benchmarks/real-prs/diffs/cloudflare-workers-sdk/14132.diff --detectors all
swarm init && swarm run --goal "verify this project builds and tests pass"

Open Source Repo: https://github.com/moonrunnerkc/swarm-orchestrator

source & further reading

dev.to — original article What Actually Enforces Code Standards in the AI Era 0.2.0: I shipped the coverage my own page had already promised You Don't Need a Better Agent. You Need a Better Debug Loop.

~/api · this article 200

$curl api.wpnews.pro/v1/news/ai-wrote-the-pr-how-do-y…

Read original on dev.to → dev.to/moonrunnerkc/ai-wrote-the-pr-how-do-you-k…

mentioned entities

Cloudflare

Devin