Stop prompting your coding agent. Design the loop that prompts it.
LoopFlow turns Claude Code into a system that runs itself: you declare a goal, a pipeline of agents, and a verification gate in one YAML file — LoopFlow iterates until the gate passes, the budget runs out, or the attempt limit is hit. One agent writes, a different agent checks, and a memory file makes every run smarter than the last.
$ loopflow run test-and-fix
Iteration 1/3
▸ fix …
done · $0.31 · resume: claude --resume 072f1abb…
▸ review (gate) …
gate FAIL · $0.12
│ The date parser fix only handles ISO strings; the failing test
│ also feeds epoch millis. Root cause not addressed.
Iteration 2/3
▸ fix …
done · $0.28
▸ review (gate) …
done · $0.11
✓ success · 2 iteration(s) · $0.82
LoopFlow demo — release-check loop, 2 iterations
For two years the workflow was: write a prompt, read the output, write the next prompt. You held the tool the whole time.
That's changing. As Boris Cherny (creator of Claude Code) put it: "I don't prompt Claude anymore. I have loops running that prompt Claude."
A loop is a recursive goal: you define what "done" looks like, and the agent iterates until it gets there. But doing this raw has three sharp edges:
Agents grade their own homework. The model that wrote the fix will happily declare it works.Unattended loops burn money. A loop running itself is also a loop making mistakes — and spending tokens — unattended.The agent forgets everything between runs. Every run re-derives what the last run already learned.
LoopFlow is a small, sharp tool built around exactly those three problems:
| Problem | LoopFlow answer |
|---|---|
| Self-grading | Gates — a separate agent, with a separate persona, must output VERDICT: PASS before the loop ends |
| Runaway cost | Budgets — a hard USD ceiling enforced twice: by the runner and by Claude Code's own --max-budget-usd on every step |
| Amnesia | Memory — a plain Markdown file per loop, appended after every run, injected into every prompt. The agent forgets; the repo doesn't |
| Collisions | Worktrees — opt-in git worktree isolation, so loops never fight you (or each other) for the working tree |
| Auditability | Every step logs a session id — claude --resume <id> drops you into the full transcript of any step, any time |
No API keys, no daemon, no cloud. If claude
works in your terminal, loopflow
works.
npm install -g @loopflow/cli # or: npx @loopflow/cli
cd your-project
loopflow init # scaffolds .loopflow/ with three starter loops
loopflow run test-and-fix --dry-run # see exactly what each agent will be told
loopflow run test-and-fix # run it for real
Requirements: Node 18+, Claude Code installed and authenticated.
name: test-and-fix
description: Run the test suite, fix failures, verify the fix.
budget:
max_usd: 2.00 # hard ceiling for the whole run, all iterations included
max_iterations: 3 # how many attempts the gate may reject
worktree: false # set true to run in an isolated git worktree
defaults:
permission_mode: acceptEdits
steps:
- id: fix
role: > # persona — appended to Claude's system prompt
You are a careful maintainer. You make the smallest change that fixes
the problem, and you never weaken a test to make it pass.
prompt: |
Run this project's test suite. Diagnose and fix the root cause of any
failure. Re-run to confirm. Summarize what you changed and why.
- id: review
gate: true # ← the loop cannot succeed until this step says PASS
role: >
You are a skeptical senior engineer reviewing a change you did not
write. You trust nothing without evidence.
prompt: |
A previous agent claims to have fixed failing tests. Inspect the diff,
re-run the suite yourself, and check no test was weakened or deleted.
┌──────────────────────────────────────────────┐
│ iteration (≤ max) │
│ │
memory ──▶ │ step: fix ──▶ step: review (gate) ──┐ │
▲ │ ▲ │ │
│ │ └── reviewer feedback ◀── FAIL ─┤ │
│ └──────────────────────────────────────┼───────┘
│ │ PASS
└──────────────── run record ◀──────────────────┘
- Each step is one headless Claude Code run (
claude -p
). Steps see the loop'smemory, the** outputs of earlier stepsin the iteration, and — on retries — the gate's feedback**. - A gate must end with
VERDICT: PASS
orVERDICT: FAIL
. No verdict counts as FAIL:an unverified pass is not a pass. - On FAIL, the loop starts over with the reviewer's feedback injected into every prompt.
- Every run appends a record to
.loopflow/memory/<loop>.md
— outcome, cost, and the final summary — which the next run reads.
Here's what that looks like in a real run — a release-check loop catching a debug artifact the fix step missed:
loopflow init
gives you three loops designed to be stolen from:
— fixer + skeptical reviewer gate. The canonical write/verify pair.test-and-fix
— a discovery loop. Maintainsdebt-audit
.loopflow/reports/debt-audit.md
and uses memory to track what got fixed, what's new, and what keeps being ignored.— finds documentation that drifted from the code, fixes it in an isolated worktree, and a gate verifies every claim against the source.docs-sync
Got a loop of your own? Contribute it to the cookbook — community loops live in loops/.
LoopFlow deliberately ships no daemon. Use the scheduler you already have:
0 9 * * 1 cd /path/to/project && loopflow run debt-audit
schtasks /create /tn "debt-audit" /sc weekly /d MON /st 09:00 ^
/tr "cmd /c cd /d C:\path\to\project && loopflow run debt-audit"
CI works too — a GitHub Action that runs loopflow run docs-sync
weekly and opens a PR from the kept worktree branch is ~20 lines.
A loop changes the work — it doesn't delete you from it. LoopFlow's design assumes three things stay true:
Verification is still on you. Gates catch the obvious failures, but--verbose
andclaude --resume <session-id>
exist so you can read what the loop actually did. Read it.Comprehension debt is real. The faster a loop ships code you didn't write, the faster the gap grows between what exists and what you understand. Memory files and kept worktrees are designed to beread by humans, not just machines.The comfortable posture is the dangerous one. When the loop runs itself, it's tempting to stop having an opinion. Design the loop with judgment — then keep judging the output.
Build the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.
| Command | What it does |
|---|---|
loopflow init [--force] |
|
Scaffold .loopflow/ with starter loops |
|
loopflow list |
|
| List loops with steps, gates, and budgets | |
loopflow validate [name] |
|
| Validate loop definitions (all by default) | |
loopflow run <name> |
|
| Run a loop | |
--dry-run |
|
| Print every composed prompt; invoke nothing | |
-i, --iterations <n> |
|
Override budget.max_iterations |
|
-b, --budget <usd> |
|
Override budget.max_usd |
|
-v, --verbose |
|
| Print full step outputs |
Exit codes: 0
success · 1
loop failed (gate exhausted, budget, error) · 2
configuration error. Cron- and CI-friendly.
Everything the CLI does is exported:
import { loadLoop, runLoop } from "@loopflow/cli";
const loop = loadLoop(process.cwd(), "test-and-fix");
const result = await runLoop(loop, { root: process.cwd() });
console.log(result.outcome, result.costUsd);
loopflow daemon
— built-in scheduler with cron expressions inloop.yaml
-
Parallel steps (fan-out across worktrees)
-
Structured gate verdicts via
--json-schema -
Loop run history &
loopflow logs -
Adapters for other headless agents (Codex CLI, …)
The most valuable contribution is a loop that solved a real problem for you — see CONTRIBUTING.md. Code contributions: the engine is ~600 lines of typed, tested TypeScript; npm test
runs in under a second.