I wanted to see how far an autonomous coding agent could get unattended. The constraint that makes this hard isn't code generation — it's that each session starts with zero memory of the last. So the design problem is state, not prompting.
Setup
The run loop
Every session does the same thing:
Commit granularity = checkpoint granularity. Worst case on an interrupted session is losing one unit, and the next run re-derives it. The "why" lines matter as much as the diffs — without them a later session re-litigates settled decisions.
Guardrails
The agent was allowed to build, test, and commit locally. It was explicitly not allowed to deploy, push to a remote, or touch secrets — those get written into PROGRESS.md as "needs human" items instead. This boundary is what makes unattended runs safe to leave alone.
What came out
A working full-stack monorepo: a pure TS scheduling engine (with property tests), multi-tenant auth, a Drizzle/Postgres schema, server-side re-validation, and publish/share/export flows. Across the runs it cleared its own stale git lock, and one session caught and fixed an off-by-one in a labeling layer that spanned five files.
The takeaway
The leverage wasn't the model writing code. It was designing a process where progress is durable across total context loss — externalize state, checkpoint constantly, log decisions not just actions, and fence off irreversible operations.
Repo/stack details in comments. Curious how others are handling agent state across sessions — file-based like this, or something more structured?