{"slug": "the-dirty-secret-behind-loop-engineering", "title": "The Dirty Secret Behind Loop Engineering", "summary": "Loop Engineering, a 2026 AI workflow pattern where developers build systems that prompt agents instead of doing it manually, has a dirty secret: without a proper evaluation component, the loop becomes an infinite runaway process. PostHog deployed it in production, achieving an 11% performance improvement and fixing a three-year-old defect. The key is to define minimal specs and let the agent iterate until the evaluation condition is met.", "body_md": "*Everyone is talking about Loop Engineering. Apparently, you don't need to program anymore.*\n\nTL;DR: Loop Engineering is the hottest AI workflow pattern of 2026. But it hides a dirty secret.\n\n// Detect dark theme var iframe = document.getElementById('tweet-2063697162748260627-249'); if (document.body.className.includes('dark-theme')) { iframe.src = \"https://platform.twitter.com/embed/Tweet.html?id=2063697162748260627&theme=dark\" }\n\nIn June 2026, [Addy Osmani](https://addyo.substack.com/p/loop-engineering) and the [PostHog team](https://newsletter.posthog.com/p/why-were-bullish-on-loops) published their takes on the same idea.\n\nInstead of prompting an AI agent manually, you build the system that prompts the agent for you.\n\nThe metaprompt idea has a fancy name now. **Loop Engineering**.\n\nLoop engineering is replacing yourself as the person who prompts the agent. You design the system that does it instead.\n\nAddy Osmani\n\nPostHog ran it in production. The result: an 11% performance improvement and a 3-year-old [defect](https://dev.to/mcsee/stop-calling-them-bugs-57gl) fixed in the query engine, hands-off.\n\nThe internet got excited. Again. Rightly so.\n\nBut there's a dirty secret hiding behind the vocabulary.\n\nA functional loop has four parts:\n\n`/goal`\n\nThe evaluation component is the key. Without it, you don't have a loop. You have an infinite runaway process. Good luck with your token bill!\n\nYou also need a [harness](https://dev.to/mcsee/ai-coding-tip-022-give-ai-a-harness-to-work-with-274a): the scaffolding that contains the agent, enforces [your rules](https://dev.to/mcsee/object-design-checklist-2p4), and gives the loop a safe boundary to operate within.\n\nHere's where Loop Engineering gets interesting, and where most people get it wrong.\n\nIf you write an [enormous spec covering all possible cases before the loop runs once](https://dev.to/mcsee/ai-coding-tip-008-use-spec-driven-development-with-ai-1k0f), you aren't doing Loop Engineering. You're doing [waterfall](https://dev.to/mcsee/coupling-the-one-and-only-software-design-problem-2pd7) with extra steps.\n\nThink about what you want to verify. Not the whole system. One behavior.\n\nLet's build a **FIFA World Cup 2026 group standings simulator** using Loop Engineering.\n\nGermany, Ivory Coast, Ecuador, and Curaçao in Group E. Three rounds of matches. The top two advance, and the best third-place teams also qualify.\n\nWhat's the smallest possible spec?\n\n```\nA team that wins a match gets 3 points.\n```\n\nThat's it. Not the whole group. Not the knockout bracket. One rule about one match.\n\nThis is the [Spec-Driven approach](https://dev.to/mcsee/ai-coding-tip-008-use-spec-driven-development-with-ai-1k0f): you define intent before implementation, but you keep the scope surgical.\n\nHere's your first loop cycle. You define the evaluation condition before any implementation exists.\n\n``` python\ndef test_win_gives_three_points():\n    germany = Team(\"Germany 🇩🇪\")\n    curacao = Team(\"Curaçao 🇨🇼\")\n\n    match = Match(germany, curacao, home_goals=7, away_goals=1)\n    standings = GroupStandings()\n    standings.record(match)\n\n    assert standings.points_for(germany) == 3\n    assert standings.points_for(curacao) == 0\n```\n\nRun this. It fails. `Team`\n\ndoesn't exist. `Match`\n\ndoesn't exist. `GroupStandings`\n\ndoesn't exist.\n\n(Germany beat Curaçao 7-1 on June 14, 2026. The spec matches reality.)\n\nThe loop condition is red 🔴.\n\nThis is the signal the loop needs. The evaluation says: not done yet. Keep running until you achieve the `/goal`\n\n.\n\nNow you give the agent the minimal implementation to make the loop exit:\n\n``` python\nclass Team:\n    def __init__(self, name):\n        self.name = name\n\nclass Match:\n    def __init__(self, home, away, home_goals, away_goals):\n        self.home = home\n        self.away = away\n        self.home_goals = home_goals\n        self.away_goals = away_goals\n\nclass GroupStandings:\n    def __init__(self):\n        self._points = {}\n\n    def record(self, match):\n        if match.home_goals > match.away_goals:\n            self._points[match.home] = \n                self._points.get(match.home, 0) + 3\n            self._points[match.away] = \n                self._points.get(match.away, 0)\n        elif match.away_goals > match.home_goals:\n            self._points[match.away] = \n                self._points.get(match.away, 0) + 3\n            self._points[match.home] = \n                self._points.get(match.home, 0)\n\n    def points_for(self, team):\n        return self._points.get(team, 0)\n```\n\nRun the spec. Green 🟢. Loop exits.\n\nNot because you modeled every rule. Because you satisfied the single condition the loop was checking.\n\nThe loop restarts with a new goal:\n\n``` python\ndef test_draw_gives_one_point_each():\n    ecuador = Team(\"Ecuador 🇪🇨\")\n    curacao = Team(\"Curaçao 🇨🇼\")\n\n    match = Match(ecuador, curacao, home_goals=0, away_goals=0)\n    standings = GroupStandings()\n    standings.record(match)\n\n    assert standings.points_for(ecuador) == 1\n    assert standings.points_for(curacao) == 1\n```\n\nRed 🔴. The `record`\n\nmethod doesn't handle draws.\n\n(Ecuador drew 0-0 with Curaçao on June 20. Again, the spec matches reality.)\n\nThe evaluation fails. Loop continues. Add the draw case. Green 🟢. Loop exits.\n\nCycle by cycle, the spec expands:\n\nEach iteration follows the same pattern: write the evaluation condition first, run it (it fails), implement the minimum to pass, run again (green 🟢), move to the next cycle.\n\nAfter 7 iterations, Group E final standings:\n\n```\nGroup E - Final Standings\n1. Germany       6 pts  GD: +6  GF: 10\n2. Ivory Coast   6 pts  GD: +2  GF: 4\n3. Ecuador       4 pts  GD:  0  GF: 2\n4. Curaçao       1 pt   GD: -8  GF: 1\n```\n\nThe same loop discipline applies to bracket generation.\n\n``` python\ndef test_group_winner_faces_different_group_runner_up():\n    bracket = KnockoutBracket(completed_group_results)\n\n    round_of_32 = bracket.round_of_32()\n\n    assert round_of_32[0].home == group_e_standings.first_place()\n    assert round_of_32[0].away == group_f_standings.second_place()\n```\n\nRed 🔴 first. Then green 🟢. Then the next spec.\n\nThe loop doesn't know the full bracket before it starts. It discovers the bracket one evaluation at a time.\n\nNone of this works without a structure that:\n\nThat is the [harness](https://dev.to/mcsee/ai-coding-tip-022-give-ai-a-harness-to-work-with-274a). The harness is what separates Loop Engineering from running Claude in a `while True`\n\nloop and hoping for the best.\n\nCodex and Claude Code now ship with built-in loop infrastructure: [ /goal](https://code.claude.com/docs/en/goal),\n\n`/loop`\n\n, `isolation: worktree`\n\n, and sub-agents for separate verification.The harness is no longer something you build from scratch.\n\nThe agent that verifies runs in a [clean sub-agent with no memory](https://dev.to/mcsee/ai-coding-tip-005-keep-context-fresh-220e) of what the implementer did.\n\nIt is an independent inspector seeking *Judgment Day* moments.\n\nIt can't grade its own work because it never saw the work being done.\n\nThis is the same reason you don't ask a developer to review their own pull request.\n\nWhy is this getting attention now and not five years ago?\n\nBecause the evaluation step (the part where the loop decides whether to continue) used to require a human. Now it doesn't.\n\nWhen models were weaker, the loop needed you to interpret the evaluation output. Now the evaluation can be the test suite itself, and the agent reads it directly.\n\nYou've been reading about [Test-Driven Development](https://www.youtube.com/watch?v=Xahv9nMegXA).\n\nThe *spec* is the test.\n\nThe *evaluation* is the test runner.\n\nThe *loop* is the red 🔴-green 🟢-refactor 🔵 cycle.\n\nThe *goal* is the failing assertion.\n\n*Loop exits when evaluation passes* means the test is green 🟢.\n\nKent Beck described this in [2003](https://en.wikipedia.org/wiki/Test-driven_development).\n\nWard Cunningham was doing it before that.\n\nThere's even a structured guide for choosing which *goal* to tackle next: the [ZOMBIES framework](https://dev.to/mcsee/how-i-survived-the-zombie-apocalypse-59gj). Zero, One, Many, Boundary, Interface, Exceptional, Simple. That is your loop iteration order.\n\nWhat changed isn't the technique. What changed is who runs the loop.\n\nIn 2003, the human developer wrote the test, ran it, read the red 🔴 output, wrote the minimum code, ran it again, saw green 🟢, and moved to the next test.\n\nThat was the loop.\n\nIn 2026, the functional developer writes the spec, the agent runs the cycle, reads the red 🔴 output, writes the minimum code, runs the cycle again, sees green 🟢, and starts the next spec. That's still the loop.\n\nThe red 🔴-green 🟢-refactor 🔵 vocabulary wasn't memorable enough for 2026. So the industry renamed it.\n\nThe evaluation is still the test. The cycle is still TDD. The discipline is exactly the same.\n\nBuild the loop. But build it like someone who intends to stay the engineer, not just the person who presses go.\n\nAddy Osmani\n\nKent Beck said the same thing. He just called it something else.\n\nA few extra tips:\n\nLoop Engineering isn't only for greenfield code or fancy MVPs. It's also how you safely modernize systems that have no tests at all.\n\nThe trick is the same: write the spec first.\n\nOn a legacy system, that spec describes behavior the system already has.\n\nYou're not inventing new rules. You're pinning existing ones so the loop can't break them.\n\nHarnesses are even more critical on production legacy systems.\n\nThe loop then shrinks the untested surface one cycle at a time.\n\nEach green 🟢 spec is a behavior the agent can't accidentally destroy in the next iteration.\n\n[Squeezing TDD onto legacy systems](https://dev.to/mcsee/how-to-squeeze-test-driven-development-on-legacy-systems-8m9) works the same way whether a human runs the cycle or an agent does. The discipline is identical. What changes is the speed.\n\nWhat are you waiting for? Build your harnesses. Start your loops.", "url": "https://wpnews.pro/news/the-dirty-secret-behind-loop-engineering", "canonical_source": "https://dev.to/mcsee/the-dirty-secret-behind-loop-engineering-1748", "published_at": "2026-06-30 12:00:00+00:00", "updated_at": "2026-06-30 12:19:26.382927+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "ai-agents", "developer-tools"], "entities": ["Addy Osmani", "PostHog", "Loop Engineering", "FIFA World Cup 2026"], "alternates": {"html": "https://wpnews.pro/news/the-dirty-secret-behind-loop-engineering", "markdown": "https://wpnews.pro/news/the-dirty-secret-behind-loop-engineering.md", "text": "https://wpnews.pro/news/the-dirty-secret-behind-loop-engineering.txt", "jsonld": "https://wpnews.pro/news/the-dirty-secret-behind-loop-engineering.jsonld"}}