We've been generating production features with AI for a while now — auth flows, billing hooks, notification handlers. And we've hit a pattern we don't have a good answer to yet.
The first feature the AI generates looks great. It reads the codebase, picks up the patterns, and the output looks like something a senior dev wrote.
The tenth feature? Less so. Small inconsistencies creep in:
None of it is wrong. All of it is subtly inconsistent.
AGENTS.md / CLAUDE.md — helps, but gets stale and doesn't scale with the codebase
Code review — catches it, but defeats some of the speed advantage
Linting + formatting — catches easy stuff, misses semantic drift
What we haven't solved: giving the AI a "living" representation of your conventions that stays current as the codebase evolves.
We're building Kumiko — an opinionated SaaS framework — partly as an answer to this. If the framework constrains what's possible, drift has less surface area. But I'm not convinced that fully solves it either.
What's your approach?