Disclosure: I work on one of the tools in this post (
create-microservices-app
). But the experiment, commands, and outputs below are real, and thepatternat the end works no matter what stack you're on — that's the part I actually want you to take.
If you ship with Claude Code, Cursor, or Codex, you know the feeling. The agent gets you 70% of the way in minutes. It compiles. The diff looks reasonable. You merge it.
And then there's the quiet doubt: did it actually get the hard 30% right — auth boundaries, payments, tenant isolation, the booking logic that stops two people taking the same slot? Because AI doesn't usually write obviously bad code. It writes plausible code. And plausible-but-wrong is the expensive kind — it passes review and breaks in production on day three.
(The data backs the doubt: 84% of devs use AI tools, only 29% trust the output, and 45% of AI-generated apps ship an exploitable vulnerability — Veracode, 2025.)
So I ran an experiment: build a real app with an agent, then deliberately make the mistake an agent makes every day, and see what — if anything — catches it.
npm create microservices-app@latest booking-demo -- --template booking-sveltekit
A full Cloudflare SvelteKit booking app — public flow, admin, D1, auth. The detail that matters for this experiment: it ships its own contract into the repo — README.agent.md
, docs/api-boundary.md
, and an executable spec, microservices.check.mjs
. The layering rule is one line: routes are thin adapters; domain logic lives in verified modules, not in your handlers.
Baseline:
$ microservices check
Template checks: pass
The request an agent gets constantly: "simplify the bookings endpoint." So I did the eager-agent thing — inlined the write straight to the DB and dropped the module:
// src/routes/api/bookings/+server.ts — the "simplified" version
export const POST: RequestHandler = async ({ request, locals }) => {
const body = await request.json();
await locals.bookingRepository.insert({
serviceId: body.serviceId,
startsAt: body.startsAt,
customerId: body.customerId
});
return json({ ok: true });
};
It type-checks. It runs. It would pass review. And it silently drops the slot-conflict guard the verified createBooking
use case enforced — a double-booking waiting to happen. Classic plausible-but-wrong.
Then I ran the check:
$ microservices check
Error: One or more generated app checks failed.
$ microservices check --json
FAIL: spec:src/routes/api/bookings/+server.ts
— Booking API route stays a thin adapter over createBooking and injected repositories.
It named the exact file and the exact contract I broke — not a vague lint warning, but "you bypassed the verified booking use case." Restore the delegation to the module, and:
$ microservices check
Template checks: pass
Green. The slot-conflict protection is back where it belongs.
Forget my tool for a second — the transferable idea is this:
The fix for plausible-but-wrong isn't a smarter model. It's a boundary your agent can't cross without a named, machine-readable failure.
Three moves you can apply on any stack:
You can roll this yourself with a test file and a grep. I happen to ship it as a contract + check
for Cloudflare apps — but the move is the move.
I ran the scaffold → contract → check
→ break → fix loop above for real. The parts that need your own machine — npm install
, npm run dev
, a deploy — are yours to run; I'm not going to claim outputs I didn't produce:
npm create microservices-app@latest booking-demo -- --template booking-sveltekit
cd booking-demo && npm install
npm run microservices -- check # the gate — wire it into your agent loop
npm run dev
(If you ship apps for clients on Cloudflare, the same gate is what lets you hand the result to a security review without the 2am call — but that's a different post.)
Repo + the rest of the modules: https://microservices.sh
Genuinely curious: how are you keeping your agent from quietly rewriting the dangerous 30%? Contract tests, review checklists, just vibes? What's caught a plausible-but-wrong change for you — and what slipped through?