Fire-and-forget AI engineering: letting agents ship a production app unsupervised

wpnews.pro

cd /news/ai-agents/fire-and-forget-ai-engineering-letti… · home › topics › ai-agents › article

[ARTICLE · art-30645] src=dev.to ↗ pub=2026-06-17T06:24Z topic=ai-agents verified=true sentiment=↑ positive

Fire-and-forget AI engineering: letting agents ship a production app unsupervised

An AI agent autonomously built a production landing page with GDPR audit logs and encryption, requiring no human supervision. Developer Piotr Karwatka demonstrated a repeatable workflow on Open Mercato, an AI-Engineering Foundation Framework for CRM/ERP, where agents implement features on isolated branches and open structured PRs for review. The workflow uses hierarchical task decomposition and adversarial spec reviews to ensure architecture compliance and avoid context burnout.

read4 min views34 publishedJun 17, 2026

"An AI agent just built a production landing page, with GDPR audit logs and encryption baked in. I wasn't even at my desk."

That is not a lucky one-shot. It is a repeatable workflow. Piotr Karwatka recorded a full tutorial showing how to go from idea to a production-ready app on Open Mercato - the AI-Engineering Foundation Framework for CRM/ERP - with no babysitting and no ping-pong prompting.

This is the technical version: what the loop actually looks like, why it doesn't fall apart, and which patterns you can lift into your own stack.

The default AI coding loop is single-threaded and human-bound:

prompt -> generate -> you spot a bug -> correct -> re-prompt -> repeat

It holds for snippets. It collapses the moment the task touches real architecture - multi-tenancy, RBAC, event flow, encryption, audit logging. Corrections pile up in the context window, the agent loses the thread, and you are back to typing. You are the bottleneck, sitting in the inner loop.

The workflow in the tutorial moves you to the outer loop: you review a finished, tested PR instead of every keystroke.

goal -> agent: branch + implement + test + open PR -> you: review PR

The reason this is even possible on Open Mercato is that the hard architectural decisions are already encoded as conventions, specs and agent-readable skills (AGENTS.md

, task routing, spec skills). The agent is not inventing how RBAC or GDPR logging should work - it reads the foundation and follows it.

The execution agent owns the full unit of work:

1. git checkout -b feat/lead-capture-landing
2. implement against framework conventions
3. run the test suite (Playwright integration tests included)
4. open a structured PR: what changed, why, how it was verified

You are no longer correcting tokens. The deliverable is a reviewable artifact. In the tutorial the output is concrete: a live site capturing leads straight into the Open Mercato CRM, with GDPR audit logs and encryption on by default - not bolted on after a compliance pass.

main

This is the part most people get wrong. One agent is trivial. N agents in parallel usually means file collisions and a corrupted main branch.

The fix is isolation by design - each agent on its own branch/worktree, never writing to main

directly:

main
 |-- agent-a -> feat/landing-page      (worktree A)
 |-- agent-b -> feat/crm-webhook       (worktree B)
 +-- agent-c -> feat/consent-logging   (worktree C)

Parallelism is only useful if it is safe. Safety here is structural (separate branches/worktrees), not "hope the agents stay out of each other's way." This is what turns autonomous coding from a single-threaded demo into something that scales like a team.

The highest-leverage step happens before any code is written. Autonomous output is only as good as the spec, so the workflow generates the spec in two passes.

Phase 1 - architecture-compliant draft. A spec-writing skill produces a spec that already respects framework conventions instead of fighting them.

spec-skill -> SPEC.md  (modules, data model, routes, events, RBAC scope)

Phase 2 - adversarial / "philosophical" review. A second pass deliberately hunts for hidden gaps the first draft missed before a line of code is committed.

review pass -> checks: routing, caching, edge cases, failure modes, consent flow

Model pairing matters here: Claude and Codex are used across the phases so the spec is both convention-compliant and stress-tested. The cost of a wrong assumption is highest at the start, so that is where the scrutiny goes. By the time code is written, the thinking is done.

Agents run autonomously for hours, which exposes the real enemy of long agent sessions: context burnout. A single agent grinding a long task fills its window with history and loses coherence.

The fix is hierarchical orchestration:

            +---------------------+
            |  Coordinator agent  |  holds the plan, delegates, keeps context lean
            +----------+----------+
        +--------------+--------------+
        v              v              v
   exec agent A   exec agent B   exec agent C
   (fresh ctx)    (fresh ctx)    (fresh ctx)

The coordinator owns the map; the workers own the tasks and run with fresh, scoped context. That separation is what makes unsupervised multi-hour runs possible without the quality collapse that usually follows.

Strip away the demo and three engineering principles remain:

The detail that is easy to skip: compliance was not a phase, it was a property of the foundation. For anyone shipping CRM/ERP in regulated markets, that is the whole game.

What is the longest you have ever let an AI agent run unsupervised? Drop it in the comments.

source & further reading

dev.to — original article Top AI Papers on Hugging Face - 2026-08-03 Beyond the Hype: Why 'Cognitive Debt' and LSP Integration Are the Real Bottlenecks in the AI-Coding Era Bringing an External CRM's Chats into Firestore for AI Search: Vector Search, Webhooks, and a Stubborn Bundling Error

~/api · this article 200

$curl api.wpnews.pro/v1/news/fire-and-forget-ai-engin…

Read original on dev.to → dev.to/tkarwatka/fire-and-forget-ai-engineering-…

mentioned entities

Piotr Karwatka

Open Mercato

Claude

Codex

GDPR

Playwright

CRM

ERP

metadata

slugfire-and-forget-ai-engineering-letting-agents-ship-a-production-app-unsupervised

topic#ai-agents

secondary4 topics

sentimentpositive

canonicaldev.to

navigation

← prevNVIDIA Discloses NeMo Command In…

next →LS Electric wins $70 mil. North …

── more in #ai-agents 4 stories · sorted by recency

github.com · 3 Aug · #ai-agents

VibeMenu – a local macOS menu-bar dashboard for Claude Code and Codex

github.com · 3 Aug · #ai-agents

Wolfpack – Private control room for coding agents

sixb.ai · 3 Aug · #ai-agents

Show HN: Sixb – the operating layer for enterprise AI

cryptobriefing.com · 3 Aug · #ai-agents

Apple finally brings real AI to Siri, directly benchmarks itself against ChatGPT

── more on @piotr karwatka 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required