{"slug": "fire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised", "title": "Fire-and-forget AI engineering: letting agents ship a production app unsupervised", "summary": "An AI agent autonomously built a production landing page with GDPR audit logs and encryption, requiring no human supervision. Developer Piotr Karwatka demonstrated a repeatable workflow on Open Mercato, an AI-Engineering Foundation Framework for CRM/ERP, where agents implement features on isolated branches and open structured PRs for review. The workflow uses hierarchical task decomposition and adversarial spec reviews to ensure architecture compliance and avoid context burnout.", "body_md": "\"An AI agent just built a production landing page, with GDPR audit logs and encryption baked in. I wasn't even at my desk.\"\n\nThat is not a lucky one-shot. It is a repeatable workflow. [Piotr Karwatka](https://www.linkedin.com/in/piotrkarwatka/) recorded a full tutorial showing how to go from idea to a production-ready app on [Open Mercato](https://github.com/open-mercato/open-mercato) - the AI-Engineering Foundation Framework for CRM/ERP - with no babysitting and no ping-pong prompting.\n\nThis is the technical version: what the loop actually looks like, why it doesn't fall apart, and which patterns you can lift into your own stack.\n\nThe default AI coding loop is single-threaded and human-bound:\n\n``` php\nprompt -> generate -> you spot a bug -> correct -> re-prompt -> repeat\n```\n\nIt holds for snippets. It collapses the moment the task touches real architecture - multi-tenancy, RBAC, event flow, encryption, audit logging. Corrections pile up in the context window, the agent loses the thread, and you are back to typing. You are the bottleneck, sitting in the inner loop.\n\nThe workflow in the tutorial moves you to the **outer loop**: you review a finished, tested PR instead of every keystroke.\n\n``` php\ngoal -> agent: branch + implement + test + open PR -> you: review PR\n```\n\nThe reason this is even possible on Open Mercato is that the hard architectural decisions are already encoded as conventions, specs and agent-readable skills (`AGENTS.md`\n\n, task routing, spec skills). The agent is not inventing how RBAC or GDPR logging should work - it reads the foundation and follows it.\n\nThe execution agent owns the full unit of work:\n\n```\n1. git checkout -b feat/lead-capture-landing\n2. implement against framework conventions\n3. run the test suite (Playwright integration tests included)\n4. open a structured PR: what changed, why, how it was verified\n```\n\nYou are no longer correcting tokens. The deliverable is a reviewable artifact. In the tutorial the output is concrete: a live site capturing leads straight into the Open Mercato CRM, with **GDPR audit logs and encryption on by default** - not bolted on after a compliance pass.\n\n`main`\n\nThis is the part most people get wrong. One agent is trivial. N agents in parallel usually means file collisions and a corrupted main branch.\n\nThe fix is isolation by design - each agent on its own branch/worktree, never writing to `main`\n\ndirectly:\n\n``` php\nmain\n |-- agent-a -> feat/landing-page      (worktree A)\n |-- agent-b -> feat/crm-webhook       (worktree B)\n +-- agent-c -> feat/consent-logging   (worktree C)\n```\n\nParallelism is only useful if it is safe. Safety here is structural (separate branches/worktrees), not \"hope the agents stay out of each other's way.\" This is what turns autonomous coding from a single-threaded demo into something that scales like a team.\n\nThe highest-leverage step happens **before any code is written**. Autonomous output is only as good as the spec, so the workflow generates the spec in two passes.\n\n**Phase 1 - architecture-compliant draft.** A spec-writing skill produces a spec that already respects framework conventions instead of fighting them.\n\n``` php\nspec-skill -> SPEC.md  (modules, data model, routes, events, RBAC scope)\n```\n\n**Phase 2 - adversarial / \"philosophical\" review.** A second pass deliberately hunts for hidden gaps the first draft missed before a line of code is committed.\n\n``` php\nreview pass -> checks: routing, caching, edge cases, failure modes, consent flow\n```\n\nModel pairing matters here: **Claude** and **Codex** are used across the phases so the spec is both convention-compliant and stress-tested. The cost of a wrong assumption is highest at the start, so that is where the scrutiny goes. By the time code is written, the thinking is done.\n\nAgents run autonomously for hours, which exposes the real enemy of long agent sessions: **context burnout**. A single agent grinding a long task fills its window with history and loses coherence.\n\nThe fix is hierarchical orchestration:\n\n```\n            +---------------------+\n            |  Coordinator agent  |  holds the plan, delegates, keeps context lean\n            +----------+----------+\n        +--------------+--------------+\n        v              v              v\n   exec agent A   exec agent B   exec agent C\n   (fresh ctx)    (fresh ctx)    (fresh ctx)\n```\n\nThe coordinator owns the map; the workers own the tasks and run with fresh, scoped context. That separation is what makes unsupervised multi-hour runs possible without the quality collapse that usually follows.\n\nStrip away the demo and three engineering principles remain:\n\nThe detail that is easy to skip: compliance was not a phase, it was a property of the foundation. For anyone shipping CRM/ERP in regulated markets, that is the whole game.\n\nWhat is the longest you have ever let an AI agent run unsupervised? Drop it in the comments.", "url": "https://wpnews.pro/news/fire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised", "canonical_source": "https://dev.to/tkarwatka/fire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised-4hk2", "published_at": "2026-06-17 06:24:48+00:00", "updated_at": "2026-06-17 06:51:44.286386+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-infrastructure", "ai-products", "generative-ai"], "entities": ["Piotr Karwatka", "Open Mercato", "Claude", "Codex", "GDPR", "Playwright", "CRM", "ERP"], "alternates": {"html": "https://wpnews.pro/news/fire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised", "markdown": "https://wpnews.pro/news/fire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised.md", "text": "https://wpnews.pro/news/fire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised.txt", "jsonld": "https://wpnews.pro/news/fire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised.jsonld"}}