# I Let AI Write My Backend Code for a Week — Here's What Actually Broke

> Source: <https://dev.to/kollittle/i-let-ai-write-my-backend-code-for-a-week-heres-what-actually-broke-1d36>
> Published: 2026-06-14 14:02:24+00:00

I told myself it would be fine. I had been using AI coding assistants for suggestions and autocomplete for months — and it worked great. So when a new project came up with a tight deadline, I thought: why not let AI handle the whole backend?

I set up a Cursor workspace, wrote a detailed spec, and hit generate. What followed was 5 days of "it compiles, but..." debugging that taught me more about software engineering than any tutorial ever did.

The boilerplate was genuinely impressive. In about 2 hours, I had:

The code looked clean. Tests passed. I was feeling like a 10x developer.

The AI generated this validation:

``` js
const userSchema = z.object({
  age: z.number(),
});
```

Looks fine, right? Except the API received ages as strings from the frontend. Zod parsed them fine in development (coercion worked). But in production with stricter mode? `NaN`

everywhere. Users were getting 400 errors on signup.

**Fix:** `z.coerce.number().int().positive()`

— but I had to find all 23 instances manually.

For a dashboard endpoint that listed users with their orders and order items, the AI generated:

``` js
const users = await prisma.user.findMany();
for (const user of users) {
  user.orders = await prisma.order.findMany({ where: { userId: user.id } });
}
```

Classic N+1. The Prisma docs literally have a page titled "How to avoid N+1 queries." With 500 users, this endpoint made 501 database queries and took 8 seconds.

**Fix:** `include`

with nested relations — one query, 120ms.

The AI wrote a token refresh flow that looked perfect in isolation. But under load, concurrent refresh requests would invalidate each other's tokens. The AI's solution? "Add a retry mechanism." My solution? "Use a refresh token rotation pattern that handles concurrency properly."

```
catch (error) {
  console.log("Error:", error);
  res.status(500).json({ error: "Something went wrong" });
}
```

`console.log`

doesn't serialize Error objects properly. Every production error was just `{}`

in the logs. We ran like this for 3 days before anyone noticed.

**Fix:** `console.error`

with proper error serialization and a proper logging library (we went with Pino).

Here's what I learned: **AI generates code that's correct in isolation but fragile in context.**

It doesn't know:

The generated code passes tests because tests are narrow. It compiles because the syntax is valid. But production is where context matters.

**AI writes the first draft, humans write the final version.** I'm not going back to writing everything from scratch, but every PR now requires a manual review of control flow, error handling, and data access patterns.

**Architecture decisions stay human.** Schema design, caching strategy, and error handling patterns are too context-dependent to outsource.

**Add integration tests that AI can't fake.** Unit tests pass. Integration tests reveal the gaps. We added a test suite that runs the full API against a real Postgres instance.

**Observability from day one.** Structured logging, request tracing, and error tracking are now part of the project template, not an afterthought.

AI didn't break my project. My assumption that "generated code equals production-ready code" did.

AI is an incredible force multiplier when used as a pair programmer. It's a liability when treated as a replacement for engineering judgment.

The week cost me 3 extra days of debugging, but I shipped a more robust system than I would have built alone — because the AI's mistakes taught me where my own blind spots were.

Use AI. But keep your hands on the wheel.

*Have you had similar experiences with AI-generated code? I'd love to hear your war stories in the comments.*
