cd /news/ai-agents/coding-is-solved-the-factory-isn-t Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-22356] src=dev.to pub= topic=ai-agents verified=true sentiment=Β· neutral

Coding is solved. The factory isn't.

A developer building a multi-repo personal code factory argues that while AI coding is "solved" through a combination of capable models, action harnesses, deterministic constraints, and project-specific skills, the architecture and infrastructure around that code generation remain unsolved. The developer is dogfooding the system daily on their local machine rather than specifying it upfront, claiming that architectural decisions cannot yet be made blindly by models and that the only viable path is iterative use and repair. The personal, local nature of the setup enables fast experimentation without organizational bottlenecks, but the developer acknowledges it is a proof-of-concept that would need to be promoted to company infrastructure to scale.

read5 min publishedJun 5, 2026

Highly opinionated, based on my personal experience. Not a prescription β€”

just notes from what I keep figuring out while dogfooding my own setup. I'm

scratching the surface, with a lot left to learn.

I'm building a multi-repo personal code factory. I don't spec it up front: I dogfood it day by day β€” using it, and asking for improvements or fixes when something breaks. The architectural decisions still can't be made blindly by the models, so daily use is how the system finds its shape. Two qualifiers about scope, then four claims.

Personal means local β€” for better and worse. I call this a personal code factory because it has no business running anywhere but on my laptop. There's no auth layer, no audit log, no sandbox between the agent and my git remotes. It has my GitHub tokens, my GitLab tokens, my Slack credentials, my pass store.

The other side of that coin is that local makes it stealth. I can use it without bothering anyone in the company. I don't have to ask DevOps for anything, there's no IT-security review to clear, no team-practice committee to harmonize with, no central infra to wait on β€” it just automates things I would otherwise do by hand. That removes every external bottleneck, and it's what lets me experiment fast.

The downside is the same thing said the other way: because it's mine, it doesn't help anyone else, and it's nowhere near as efficient as something that would run on GitLab or Slack directly. This is a POC. If it turns out to work, the right move is to promote it to actual company infrastructure.

Multi-repo means a particular kind of hard. I run it on multi-repo because that's what my work looks like. If your code lives in a single repo, a lot of what I describe in this series either disappears or shows up differently. I'm not claiming the multi-repo case is the interesting one β€” just that it's the one I have.

Coding is solved β€” Cherny's phrase, and I think he's right. It took four things, and they're not the same thing.

The model: capable enough to write the code.

The harness: what lets the model act instead of just emitting text β€” read the repo, run the tests, iterate, fix. (Mine is Claude Code, but the principle isn't tied to it.)

A layer of deterministic constraints: checks that keep the output converging toward quality instead of tech debt. I work in Python, so for me that's ruff, ty, tach run through prek, plus gitleaks and a stack of project-specific hooks. Different language, different tools β€” the constraint is the point, not the toolchain.

And skills: written guidance that gives the model the business and project knowledge to make the right call in this codebase, not a generic one.

Take any one of the four away and it stops working. What none of them guarantees is that the architecture is right β€” and that is the next claim.

The factory around it isn't solved. I don't think you can specify it up front.

There are two ways to get a system that builds and ships software for you. One: write the spec β€” every edge case, every failure mode, every integration β€” hand it to an agent, let it build. Two: use it every day and fix what breaks.

I don't believe in the first β€” at least I wouldn't try it. A spec for a system that builds, reviews, and ships software ends up being more or less the system itself: you don't find out which edges bite until they bite. And the architectural calls inside it still can't be made blindly by the models, so the spec would have to make all of them in advance β€” that's the part I don't see working yet.

That leaves the second way.

That leaves dogfooding β€” using the thing every day, fixing what breaks, keeping it running tomorrow.

Dogfooding fuses three things into one loop that no spec can: verifying the system works, improving it where it's wrong, and keeping it running long enough to do both. The first two are the same act β€” you verify by trying to use it, and the parts that don't work are the parts you fix.

Making that verification less manual split into two halves. The proactive half is a test suite that checks whether the agent behaves as intended β€” did it reach for the right tool, did it avoid the wrong one β€” so a behavior regression shows up as a red test instead of going unnoticed for days. I'm only starting on these: a handful of behavioral scenarios plus the deterministic checks around them, noisy enough that I don't lean on them yet. The reactive half is a runtime hook that catches a bad action as it happens and refuses it β€” the backstop for when the agent misbehaves anyway. I lean on those far more today. But every backstop I need is something the proactive half didn't catch in time. If the evals and the agent were good enough, the gates would be dead weight. They aren't yet, so I keep both.

The third thing in the loop is the precondition. Self-improvement and resilience are two sides of the same coin. A system that shuts down can't keep improving itself. If I had to pick which matters more, it's resilience β€” improvement stops the moment the loop stops. You don't get either by specifying them. You get both by running the thing every day and refusing to let it stay broken.

So who orchestrates the loop? That's the last claim.

Orchestration looks like the part that stays human: holding the big picture, deciding what gets attention first, noticing when two threads are about the same thing, deciding what to keep and what to drop.

In teatree most of it already runs without me. One orchestrator with the big picture, not a swarm β€” it arbitrates and hands the actual work to sub-agents. What still needs me is basically troubleshooting and steering, and I assume that the loop can't be fully closed as long as the behavioral evals are missing.

I'll try to publish roughly one post a week. Each one is the thing I keep getting wrong and trying to get less wrong:

I'll change my mind about some of this between now and the last post. That's the point.

── more in #ai-agents 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/coding-is-solved-the…] indexed:0 read:5min 2026-06-05 Β· β€”