AI doesn't write bad code. It writes plausible code — so I tried to break my own AI-built app

A developer at a company building create-microservices-app deliberately broke their own AI-built booking app to test whether automated contract checks catch plausible-but-wrong code. The experiment showed that a machine-readable boundary—not a smarter model—can prevent AI-generated code from silently dropping critical business logic like slot-conflict guards. The developer advocates for embedding executable contracts and check gates into the agent workflow to catch such errors before production.

Disclosure: I work on one of the tools in this post create-microservices-app . But the experiment, commands, and outputs below are real, and thepatternat the end works no matter what stack you're on — that's the part I actually want you to take. If you ship with Claude Code, Cursor, or Codex, you know the feeling. The agent gets you 70% of the way in minutes. It compiles. The diff looks reasonable. You merge it. And then there's the quiet doubt: did it actually get the hard 30% right — auth boundaries, payments, tenant isolation, the booking logic that stops two people taking the same slot? Because AI doesn't usually write obviously bad code. It writes plausible code. And plausible-but-wrong is the expensive kind — it passes review and breaks in production on day three. The data backs the doubt: 84% of devs use AI tools, only 29% trust the output , and 45% of AI-generated apps ship an exploitable vulnerability — Veracode, 2025. So I ran an experiment: build a real app with an agent, then deliberately make the mistake an agent makes every day , and see what — if anything — catches it. npm create microservices-app@latest booking-demo -- --template booking-sveltekit A full Cloudflare SvelteKit booking app — public flow, admin, D1, auth. The detail that matters for this experiment: it ships its own contract into the repo — README.agent.md , docs/api-boundary.md , and an executable spec, microservices.check.mjs . The layering rule is one line: routes are thin adapters; domain logic lives in verified modules, not in your handlers. Baseline: bash $ microservices check Template checks: pass The request an agent gets constantly: "simplify the bookings endpoint." So I did the eager-agent thing — inlined the write straight to the DB and dropped the module: js // src/routes/api/bookings/+server.ts — the "simplified" version export const POST: RequestHandler = async { request, locals } = { const body = await request.json ; await locals.bookingRepository.insert { serviceId: body.serviceId, startsAt: body.startsAt, customerId: body.customerId } ; return json { ok: true } ; }; It type-checks. It runs. It would pass review. And it silently drops the slot-conflict guard the verified createBooking use case enforced — a double-booking waiting to happen. Classic plausible-but-wrong. Then I ran the check: bash $ microservices check Error: One or more generated app checks failed. $ microservices check --json FAIL: spec:src/routes/api/bookings/+server.ts — Booking API route stays a thin adapter over createBooking and injected repositories. It named the exact file and the exact contract I broke — not a vague lint warning, but "you bypassed the verified booking use case." Restore the delegation to the module, and: bash $ microservices check Template checks: pass Green. The slot-conflict protection is back where it belongs. Forget my tool for a second — the transferable idea is this: The fix for plausible-but-wrong isn't a smarter model. It's a boundary your agent can't cross without a named, machine-readable failure. Three moves you can apply on any stack: You can roll this yourself with a test file and a grep. I happen to ship it as a contract + check for Cloudflare apps — but the move is the move. I ran the scaffold → contract → check → break → fix loop above for real. The parts that need your own machine — npm install , npm run dev , a deploy — are yours to run; I'm not going to claim outputs I didn't produce: npm create microservices-app@latest booking-demo -- --template booking-sveltekit cd booking-demo && npm install npm run microservices -- check the gate — wire it into your agent loop npm run dev If you ship apps for clients on Cloudflare, the same gate is what lets you hand the result to a security review without the 2am call — but that's a different post. Repo + the rest of the modules: https://microservices.sh https://microservices.sh Genuinely curious: how are you keeping your agent from quietly rewriting the dangerous 30%? Contract tests, review checklists, just vibes? What's caught a plausible-but-wrong change for you — and what slipped through?