{"slug": "show-hn-loopflow-loop-engineering-for-claude-code", "title": "Show HN: LoopFlow – loop engineering for Claude Code", "summary": "LoopFlow, a new open-source tool, turns Claude Code into a self-running system by allowing developers to define goals, agent pipelines, and verification gates in a YAML file. It addresses key issues like self-grading agents, runaway costs, and memory loss by using separate gate agents, hard budget limits, and persistent memory files. The tool runs entirely in the terminal without API keys or cloud dependencies.", "body_md": "**Stop prompting your coding agent. Design the loop that prompts it.**\n\nLoopFlow turns Claude Code into a system that runs itself: you declare a goal, a pipeline of agents, and a verification gate in one YAML file — LoopFlow iterates until the gate passes, the budget runs out, or the attempt limit is hit. One agent writes, a *different* agent checks, and a memory file makes every run smarter than the last.\n\n``` bash\n$ loopflow run test-and-fix\n\nIteration 1/3\n  ▸ fix …\n    done · $0.31 · resume: claude --resume 072f1abb…\n  ▸ review (gate) …\n    gate FAIL · $0.12\n    │ The date parser fix only handles ISO strings; the failing test\n    │ also feeds epoch millis. Root cause not addressed.\n\nIteration 2/3\n  ▸ fix …\n    done · $0.28\n  ▸ review (gate) …\n    done · $0.11\n\n✓ success · 2 iteration(s) · $0.82\n```\n\n*LoopFlow demo — release-check loop, 2 iterations*\n\nFor two years the workflow was: write a prompt, read the output, write the next prompt. You held the tool the whole time.\n\nThat's changing. As Boris Cherny (creator of Claude Code) put it: *\"I don't prompt Claude anymore. I have loops running that prompt Claude.\"*\n\nA loop is a **recursive goal**: you define what \"done\" looks like, and the agent iterates until it gets there. But doing this raw has three sharp edges:\n\n**Agents grade their own homework.** The model that wrote the fix will happily declare it works.**Unattended loops burn money.** A loop running itself is also a loop making mistakes — and spending tokens — unattended.**The agent forgets everything between runs.** Every run re-derives what the last run already learned.\n\nLoopFlow is a small, sharp tool built around exactly those three problems:\n\n| Problem | LoopFlow answer |\n|---|---|\n| Self-grading | Gates — a separate agent, with a separate persona, must output `VERDICT: PASS` before the loop ends |\n| Runaway cost | Budgets — a hard USD ceiling enforced twice: by the runner and by Claude Code's own `--max-budget-usd` on every step |\n| Amnesia | Memory — a plain Markdown file per loop, appended after every run, injected into every prompt. The agent forgets; the repo doesn't |\n| Collisions | Worktrees — opt-in git worktree isolation, so loops never fight you (or each other) for the working tree |\n| Auditability | Every step logs a session id — `claude --resume <id>` drops you into the full transcript of any step, any time |\n\nNo API keys, no daemon, no cloud. If `claude`\n\nworks in your terminal, `loopflow`\n\nworks.\n\n```\nnpm install -g @loopflow/cli   # or: npx @loopflow/cli\n\ncd your-project\nloopflow init             # scaffolds .loopflow/ with three starter loops\nloopflow run test-and-fix --dry-run   # see exactly what each agent will be told\nloopflow run test-and-fix             # run it for real\n```\n\nRequirements: Node 18+, [Claude Code](https://claude.com/claude-code) installed and authenticated.\n\n```\n# .loopflow/loops/test-and-fix.yaml\nname: test-and-fix\ndescription: Run the test suite, fix failures, verify the fix.\n\nbudget:\n  max_usd: 2.00        # hard ceiling for the whole run, all iterations included\n  max_iterations: 3    # how many attempts the gate may reject\n\nworktree: false        # set true to run in an isolated git worktree\n\ndefaults:\n  permission_mode: acceptEdits\n\nsteps:\n  - id: fix\n    role: >            # persona — appended to Claude's system prompt\n      You are a careful maintainer. You make the smallest change that fixes\n      the problem, and you never weaken a test to make it pass.\n    prompt: |\n      Run this project's test suite. Diagnose and fix the root cause of any\n      failure. Re-run to confirm. Summarize what you changed and why.\n\n  - id: review\n    gate: true         # ← the loop cannot succeed until this step says PASS\n    role: >\n      You are a skeptical senior engineer reviewing a change you did not\n      write. You trust nothing without evidence.\n    prompt: |\n      A previous agent claims to have fixed failing tests. Inspect the diff,\n      re-run the suite yourself, and check no test was weakened or deleted.\n┌──────────────────────────────────────────────┐\n            │              iteration (≤ max)               │\n            │                                              │\n memory ──▶ │  step: fix ──▶ step: review (gate) ──┐       │\n   ▲        │      ▲                               │       │\n   │        │      └── reviewer feedback ◀── FAIL ─┤       │\n   │        └──────────────────────────────────────┼───────┘\n   │                                               │ PASS\n   └──────────────── run record ◀──────────────────┘\n```\n\n- Each step is one headless Claude Code run (\n`claude -p`\n\n). Steps see the loop's**memory**, the** outputs of earlier steps**in the iteration, and — on retries — the** gate's feedback**. - A gate must end with\n`VERDICT: PASS`\n\nor`VERDICT: FAIL`\n\n. No verdict counts as FAIL:*an unverified pass is not a pass.* - On FAIL, the loop starts over with the reviewer's feedback injected into every prompt.\n- Every run appends a record to\n`.loopflow/memory/<loop>.md`\n\n— outcome, cost, and the final summary — which the next run reads.\n\n### Here's what that looks like in a real run — a release-check loop catching a debug artifact the fix step missed:\n\n`loopflow init`\n\ngives you three loops designed to be stolen from:\n\n— fixer + skeptical reviewer gate. The canonical write/verify pair.`test-and-fix`\n\n— a discovery loop. Maintains`debt-audit`\n\n`.loopflow/reports/debt-audit.md`\n\nand uses memory to track what got fixed, what's new, and what keeps being ignored.— finds documentation that drifted from the code, fixes it in an isolated worktree, and a gate verifies every claim against the source.`docs-sync`\n\nGot a loop of your own? [Contribute it to the cookbook](/faisalishfaq2005/loopflow/blob/main/CONTRIBUTING.md) — community loops live in [ loops/](/faisalishfaq2005/loopflow/blob/main/loops).\n\nLoopFlow deliberately ships no daemon. Use the scheduler you already have:\n\n```\n# cron (Linux/macOS) — audit debt every Monday at 9am\n0 9 * * 1  cd /path/to/project && loopflow run debt-audit\n\n# Windows Task Scheduler\nschtasks /create /tn \"debt-audit\" /sc weekly /d MON /st 09:00 ^\n  /tr \"cmd /c cd /d C:\\path\\to\\project && loopflow run debt-audit\"\n```\n\nCI works too — a GitHub Action that runs `loopflow run docs-sync`\n\nweekly and opens a PR from the kept worktree branch is ~20 lines.\n\nA loop changes the work — it doesn't delete you from it. LoopFlow's design assumes three things stay true:\n\n**Verification is still on you.** Gates catch the obvious failures, but`--verbose`\n\nand`claude --resume <session-id>`\n\nexist so you can read what the loop actually did. Read it.**Comprehension debt is real.** The faster a loop ships code you didn't write, the faster the gap grows between what exists and what you understand. Memory files and kept worktrees are designed to be*read by humans*, not just machines.**The comfortable posture is the dangerous one.** When the loop runs itself, it's tempting to stop having an opinion. Design the loop with judgment — then keep judging the output.\n\nBuild the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.\n\n| Command | What it does |\n|---|---|\n`loopflow init [--force]` |\nScaffold `.loopflow/` with starter loops |\n`loopflow list` |\nList loops with steps, gates, and budgets |\n`loopflow validate [name]` |\nValidate loop definitions (all by default) |\n`loopflow run <name>` |\nRun a loop |\n` --dry-run` |\nPrint every composed prompt; invoke nothing |\n` -i, --iterations <n>` |\nOverride `budget.max_iterations` |\n` -b, --budget <usd>` |\nOverride `budget.max_usd` |\n` -v, --verbose` |\nPrint full step outputs |\n\nExit codes: `0`\n\nsuccess · `1`\n\nloop failed (gate exhausted, budget, error) · `2`\n\nconfiguration error. Cron- and CI-friendly.\n\nEverything the CLI does is exported:\n\n``` js\nimport { loadLoop, runLoop } from \"@loopflow/cli\";\n\nconst loop = loadLoop(process.cwd(), \"test-and-fix\");\nconst result = await runLoop(loop, { root: process.cwd() });\nconsole.log(result.outcome, result.costUsd);\n```\n\n-\n`loopflow daemon`\n\n— built-in scheduler with cron expressions in`loop.yaml`\n\n- Parallel steps (fan-out across worktrees)\n- Structured gate verdicts via\n`--json-schema`\n\n- Loop run history &\n`loopflow logs`\n\n- Adapters for other headless agents (Codex CLI, …)\n\nThe most valuable contribution is a **loop that solved a real problem for you** — see [CONTRIBUTING.md](/faisalishfaq2005/loopflow/blob/main/CONTRIBUTING.md). Code contributions: the engine is ~600 lines of typed, tested TypeScript; `npm test`\n\nruns in under a second.", "url": "https://wpnews.pro/news/show-hn-loopflow-loop-engineering-for-claude-code", "canonical_source": "https://github.com/faisalishfaq2005/loopflow", "published_at": "2026-06-20 21:01:33+00:00", "updated_at": "2026-06-20 21:40:17.068556+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "large-language-models", "ai-tools"], "entities": ["LoopFlow", "Claude Code", "Boris Cherny", "Node"], "alternates": {"html": "https://wpnews.pro/news/show-hn-loopflow-loop-engineering-for-claude-code", "markdown": "https://wpnews.pro/news/show-hn-loopflow-loop-engineering-for-claude-code.md", "text": "https://wpnews.pro/news/show-hn-loopflow-loop-engineering-for-claude-code.txt", "jsonld": "https://wpnews.pro/news/show-hn-loopflow-loop-engineering-for-claude-code.jsonld"}}