On June 7, 2026, Peter Steinberger β the creator of OpenClaw β posted twelve words that ricocheted across every AI engineering corner of the internet: "You shouldn't be prompting coding agents anymore. You should be designing loops that prompt your agents."
π
[Read the full version with charts and embedded sources on AgentConn β]
The same week, Boris Cherny β creator and head of Claude Code at Anthropic β said the same thing on stage: "I don't prompt Claude anymore. I have loops running. They're the ones prompting Claude and figuring out what to do. My job is to write loops."
Two builders. Same conclusion. Arrived at independently. And when Latent Space published "Loopcraft: The Art of Stacking Loops" the same week β weaving together Steinberger, Cherny, and Andrej Karpathy β it crystallized something that practitioners had been circling for months: the highest-leverage skill in AI engineering is no longer writing a good prompt. It's designing the system that writes the prompts for you.
They're calling it loopcraft. And it's eating the harness layer.
Addy Osmani's June 2026 essay β "Loop Engineering" β gave the concept its clearest definition. Loop engineering, he wrote, sits "one floor above" agent harness engineering. Instead of crafting one perfect instruction, you design the cycle that keeps an AI coding agent working, testing, learning, and stopping.
Osmani identified six primitives that compose a loop:
This isn't new territory conceptually. Simon Willison framed it in September 2025: "Designing agentic loops is a critical new skill to develop." He defined agents as "things that run tools in a loop to achieve a goal" and argued that the art of using them well involves carefully designing the tools and the loop, not just the prompt.
What changed between Willison's September 2025 framing and Steinberger's June 2026 declaration is infrastructure. Back then, loops were theoretical. Now they're shipping β and the tooling has caught up.
If you want to understand what a well-designed loop looks like in practice, look at Karpathy's autoresearch. Released in March 2026, autoresearch is 630 lines of Python that ran 50 ML experiments overnight on a single GPU. The project picked up 21,000+ stars and 8.6 million views on Karpathy's announcement within days.
Here's why it matters: autoresearch doesn't just automate a task. It automates the research loop itself β the cycle where a researcher forms a hypothesis, edits code, runs a training session, checks the result, and decides whether to keep the change. The agent reads a program.md
file for research direction, modifies train.py
with a proposed change, commits it, runs training for exactly 5 minutes, evaluates the result using val_bpb
, and either keeps or reverts the change. Then it loops.
The most technically interesting thing about autoresearch is that the loop is defined in English. program.md
is a document that specifies a complete research methodology: what to modify, what to leave alone, how to evaluate, how to handle failure cases, and a blanket prohibition on asking for help. A coding agent reads this document and executes it indefinitely.
This is the template for loopcraft: don't give the agent a task. Give it a methodology. Let the methodology be the loop.
The ecosystem followed immediately. autoresearch-skill turned Karpathy's pattern into a portable skill for Claude Code, Codex CLI, and Gemini CLI.
βΉοΈ A loop is not a script. A script runs the same steps every time. A loop runs the same
structureevery time but lets the agent decide what fills each step. The agent isn't following instructions β it's following a methodology.
Three patterns have emerged as the workhorses of the loopcraft era.
Find β Verify β Synthesize. The most common pattern. One agent (or fleet of agents) searches for something β bugs, information, files matching a pattern. A second fleet independently verifies each finding. A third synthesizes the verified results into a deliverable. Ken Huang's deep dive on Claude Code orchestration documents this as the default pattern for code review workflows.
Loop-until-dry. For unknown-size discovery β audits, backlogs, edge cases β you keep spawning agents until K consecutive rounds return nothing new. Simple counters (while count < N
) miss the tail. Loop-until-dry catches it because it doesn't assume a fixed target.
Adversarial verify. Spawn N independent skeptics per finding, each prompted to refute. Kill the finding if a majority refute it. The Neuron's interview with Cherny and Cat Wu describes this as central to how the Claude Code team itself works β verification isn't a step at the end, it's a loop that runs in parallel with generation.
π‘ The counter-argument: "This is just scripting with extra steps." The difference is that a script defines the
steps. A loop defines thestructureand lets an agent fill the steps. When the agent encounters something unexpected, a script fails. A loop adapts.
On May 28, 2026, Anthropic shipped ultracode β a Claude Code setting that activates automatic multi-agent workflow orchestration. Set effort to ultracode and Claude evaluates each request, deciding on its own whether the task warrants a full workflow. When it does, it writes a JavaScript script on the fly, plans an understand-change-verify loop, and dispatches subagents to work in parallel.
Dynamic Workflows replaced context-window orchestration with deterministic scripts. The script is the loop. It specifies what fans out, what verifies, what synthesizes. And because it's a script β not a prompt β it's reproducible, debuggable, and composable.
Meanwhile, the ecosystem materialized. obra/superpowers shipped a complete software development methodology built on composable skills β 1,276+ stars and growing. The cobusgreyling/loop-engineering repo cataloged patterns from Osmani and Cherny into a practical reference.
β οΈ A caution on complexity: The most effective loops are often the simplest. Karpathy's autoresearch is 630 lines. Loopcraft isn't about maximizing agents β it's about maximizing leverage per loop iteration.
Pull up GitHub Trending for the past month and the pattern is unmistakable:
Boris Cherny confirmed the endpoint in March 2026: Claude Code is now 100% written by Claude Code itself. Across 259 pull requests in one month, Cherny didn't open an IDE once. The loops handled everything.
Stop optimizing prompts. Start optimizing loop structure. The marginal return on a better prompt is shrinking. The marginal return on a better loop β better verification, better composition, better stopping conditions β is expanding.
Write your methodology, not your instructions. Karpathy's program.md
is the template. Don't tell the agent what to do. Tell it how to decide what to do, how to evaluate whether it worked, and when to stop.
Think in patterns, not tasks. Find-verify-synthesize. Loop-until-dry. Adversarial verify. These patterns compose across domains. Learn them once, apply everywhere.
The harness is commoditizing. The loop is the moat. Which harness you use (Claude Code, Codex, Gemini CLI) matters less every month. How you design the loop that runs on top of it β the skills, the verification, the orchestration β that's where the leverage lives now.
The agents are waiting. Design the loop.
Originally published at AgentConn