#Agentic Loops: Why the Best AI Coding Workflows Are Loops, Not Prompts Most people still use AI to code the way they'd use a very fast intern with no memory: write a prompt, get a blob of code, eyeball it, paste it, hope. It works until it doesn't — until the change is too big for one context window, too risky for one diff, or too easy to get plausibly wrong.
The teams shipping real work with agents have quietly moved to a different shape. Not a prompt. A loop.
An agentic loop is simple to state and hard to get right:
Act → check against a hard gate → repeat, until a convergence signal says stop.
That's it. The agent makes one small change, an automated gate decides whether it counts, and the loop runs again — collecting wins, reverting losses, and stopping itself when there's nothing left worth doing. The interesting part isn't the agent. It's the harness around it that makes thousands of small autonomous steps safe.
We just shipped an Agentic Loops skill pack — eight battle-tested loop patterns for coding agents. This post is the why behind it.
#One-shot prompting has a ceiling Ask an agent to "fix all the failing tests" or "migrate this codebase to the new API" in one shot and you hit the same wall every time:
It overflows. A 200-file refactor doesn't fit in context, so the agent guesses at the parts it can't see.It can't be reviewed. A 600-line diff from a single prompt is a diff no human reads honestly. You skim it and merge on faith.It rewards plausible-but-wrong. With no gate, the agent'sconfidencebecomes the acceptance criterion. Confidence is not correctness.It has no off switch."Find bugs" runs once and stops, whether it found everything or nothing.
A loop fixes all four — not by making the agent smarter, but by changing the unit of work from "one big leap" to "many small, checked steps."
#The three invariants Every good agentic loop — whatever the task — enforces the same three rules. Skip any one and the loop degrades into expensive busywork or, worse, confident damage.
#1. A hard automated gate, every iteration
The gate is the heart of the whole thing. It's a deterministic check the agent cannot talk its way past: a test suite's exit code, tsc --noEmit
returning zero, an eval score that didn't regress, a per-batch row-count reconciliation.
The rule: a change that doesn't pass the gate didn't happen. Reverted, not merged. The gate is what makes it safe to let ten agents edit forty files in parallel — because nothing lands unless it's green.
#2. One attributable change per iteration Batch "fix these four things" into one step and when two pass and two regress, you can't tell which edit did what. The agent starts shotgun-editing to move the number, and you lose the thread.
One change, one gate run, one verdict. It's slower per step and far faster overall, because every step is independently reviewable and revertible. When something breaks, you git bisect
to the exact line instead of debugging a half-finished mess.
#3. An honest convergence signal A loop without a stop condition either runs forever or stops arbitrarily. The fix is to instrument progress and stop on it: the skip-rate crossing 50%, the failing-test count hitting zero, the bug-finder going quiet for K consecutive rounds, the eval score plateauing.
The discipline here is honest skips. When a page is already polished, the correct output is "changed nothing, here's why" — not a forced, marginal tweak to look busy. A loop that knows when it's done is worth ten that grind on.
#What a loop catches that a prompt never will
A concrete one. We pointed our self-improvement loop at a production admin panel — screenshot every page, look at it, make one small improvement, run tsc
eslint
, repeat. Over several rounds it produced ~85 improvements with a clean gate on every batch.
But the best moment wasn't a polish. One round, the screenshot harness — which also listens for uncaught page errors — flagged a settings tab rendering the framework's full-page crash screen. An API-only health check had been green the whole time, because the crash was client-side. A human skimming thumbnails would've missed it. The loop caught it automatically, we captured the actual error (Cannot read properties of undefined (reading 'memes')
), traced it to a state-merge bug, and fixed it at the root — and the harness now flags that entire class of bug forever.
That's the payoff: a loop doesn't just do work, it builds a ratchet that keeps regressions from coming back.
#Eight loops for eight jobs The pattern is universal; the gate and the convergence signal change with the task. The pack covers eight:
| Loop | One iteration | The gate | Stops when |
|---|---|---|---|
| Self-improvement | screenshot → improve one thing → test | tsc 0 / eslint 0 |
skip-rate > 50% |
| Test-and-fix | run tests → fix the first failure → re-run all | test exit code | failing count → 0 |
| Bug-hunt | diverse finders → verify by skeptics → fix | survives adversarial review + repro | K rounds find nothing new |
| Migration | scout sites → transform each → verify | per-file typecheck + runtime | un-migrated residue → 0 |
| Eval-driven | propose one change → re-run evals → keep if better | no score regression | dev score plateaus |
| Research-synthesis | gather → critique for gaps → fill | every claim cites read source | critic finds no gaps |
| Refactor-under-tests | tiny structure change → full suite | suite green after every step | target structure reached |
| Data-backfill | batch → checkpoint → verify → resume | per-batch invariants + reconciliation | cursor done + source == dest |
A few patterns worth calling out because they're the ones people get wrong:
Bug-hunt lives or dies on two details. Finders must beperspective-diverse— give each a different lens (correctness, security, concurrency, leaks) or five identical finders just agree on the same obvious null-deref. And verification must beadversarial— a verifier asked to "confirm this bug" rubber-stamps plausible nonsense; one told to "refute this, default to refuted, confirm only with a concrete repro" gives you findings you can trust. The subtle killer: dedup against everything you'veseen, not just what youconfirmed— or every rejected finding gets re-found forever and the loop never goes dry.
Test-and-fix is paranoid on purpose. The agent will happily make a test pass by deleting the assertion. So the loop diffs the test files every iteration and rejects any change thatshrank or weakenedthem. It fixes one failure at a time, re-runs thewhole suite (the fix for failure #1 routinely breaks failure #5), and detects "stuck" — the same failure signature twice means escalate, not loop forever.
Refactor-under-tests has one invariant: behavior is identical at every step. If the suite is thin, the agent writescharacterization teststhat pin current behavior — bugs and all —before touching structure. Then it takes steps so small each one is independently green and revertible. The moment it needs to edit a test to pass, it stops: that's a behavior change wearing a refactor's clothes.
Data-backfill is built on idempotency. A six-hour job over ten million rowswillbe interrupted, so every batch must be safely re-runnable (upsert by key, never blind insert) and the cursor must checkpoint after each batch — restart-from-zero on a transient blip means the job never finishes. It verifies as it goes and reconciles source-against-destination at the end, because "the loop stopped erroring" is not the same as "all the data is correct."
#How to build your own You don't need a framework. You need three things and the discipline to wire them honestly:
Pick a gate that can't be argued with. Exit code, typecheck, eval score, row count. If you can't name the gate, you don't have a loop — you have a vibe.Make the smallest possible unit of work. One failure, one page, one call-site, one batch. Attributable and revertible.Instrument convergence and respect it. Count something that trends to zero. Let the loop tell you it's done — and believe it when it does.
Then, if the work is big, fan it out: many agents, each owning a distinct slice (a different file, a different page), each passing the same gate before anything commits. Parallel autonomy is only safe because of the gate. That's the whole trick.
#The takeaway The leap in agent productivity over the last year wasn't a smarter model writing a better one-shot answer. It was the realization that you get further by letting a good-enough agent take a thousand small, checked steps than by asking a great one to take a single perfect leap.
A prompt produces an artifact. A loop produces a process — one that keeps working while you sleep, refuses to ship what doesn't pass, and tells you honestly when it's finished.
The eight patterns above are on SkillDB now, each with the philosophy, the loop diagram, a runnable driver, the hard-won gotchas, and the exact gate that makes it safe. Point your agent at the Agentic Loops pack and give it a loop to run.
Related Posts
Why Agents Suck at Architecture: skilldb-architect-styles
I spent six hours watching an agent try to design a house. It was like watching a blender try to paint a sunset. The results are technically impressive but emotionally void.
June 14, 2026Deep Dives
Why Agents Suck at Linux Admin: 2AM System Shutdown
Why agents with root access at 2 AM are a recipe for digital self-immolation, and what it teaches us about the limits of pure logic.
June 13, 2026[Deep Dives](/blog/why-agents-suck-at-leaks-testing-leak-exposure-monitoring-skills)
#### Why Agents Suck at Leaks: Testing leak-exposure-monitoring-skills
I spent 48 hours letting an autonomous agent comb the dark web for corporate leaks using SkillDB. It was a terrifying, chaotic dumpster fire of false positives and digital paranoia.
June 12, 2026