AI agents open a lot of pull requests now. Most are fine. Some quietly cheat to make the checks go green: they delete the failing test, weaken an assertion, wrap the broken call in an empty catch
so the error disappears. The diff looks done. A reviewer skimming forty agent PRs a day will not catch that by eye.
swarm audit
is a command-line tool that does. I maintain it. It runs three jobs on AI-written code, all offline, no API key.
Eleven checks read a pull-request diff and flag the shortcut patterns: a deleted test with no matching code change, a function renamed while its callers still use the old name, an error swallowed by an empty catch, a mock of a package that exists in no manifest, a type-checker suppression dropped over a changed line, and more.
The detection is measured, not asserted. Hide one known cheat in each of 300 real merged PRs, run the auditor, count the catches: 254, about 85%, reproducible with one command.
The catch that matters most is on real code. On two merged Cloudflare PRs it flagged a rename that left two callers pointing at a dead function, and an empty catch {}
that throws every error away. Semgrep (210 rules) and ESLint's security rules flagged neither, because they hunt for dangerous code like an injection or a leaked secret, and a deleted test is not dangerous code, it is missing code. The auditor also names the agent that wrote the PR: on a live fetch it tagged the author as Devin. Findings ship advisory, so it reports and never blocks your merge unless you ask it to.
The second mode runs before a change is accepted. You hand it a plain goal. It compiles that goal into a contract of machine-checkable obligations: build passes, tests pass, coverage holds, a named function has the right signature, a property holds, performance does not regress. Candidate patches get generated, and one is admitted only if it satisfies every obligation. Adversarial falsifiers actively try to break a patch before it counts.
In a fresh project it compiled a goal into two obligations, verified both, confirmed nothing regressed after the merge, and spent zero tokens doing it. Turn on gate mode and it becomes a hard merge rule: a diff that strips a test exits non-zero and never lands.
If you ship or buy AI-written code under the EU AI Act or CISA's SBOM-for-AI guidance, someone will ask for a record of the AI involvement. The tool emits one: a CycloneDX 1.6 ML bill-of-materials and an SPDX 3.0 AI-Profile, both valid against their specs, plus a hash-chained evidence ledger where altering any entry breaks the chain. It ships with the mappings to EU AI Act Annex IV and the CISA minimum elements.
git clone https://github.com/moonrunnerkc/swarm-orchestrator
cd swarm-orchestrator && npm install && npm run build
npm run benchmarks:oracle # the ~85% number
node dist/src/cli.js audit --diff-file benchmarks/real-prs/diffs/cloudflare-workers-sdk/14132.diff --detectors all
swarm init && swarm run --goal "verify this project builds and tests pass"
Open Source Repo: https://github.com/moonrunnerkc/swarm-orchestrator