{"slug": "prompt-engineering-is-dead-long-live-the-agentic-loop", "title": "Prompt Engineering is Dead. Long Live the Agentic Loop.", "summary": "A developer explains that prompt engineering is being replaced by agentic loops, where AI agents autonomously plan, edit, test, and fix code until tests pass, then open a pull request for human review. Tools like Claude Code, Copilot, Cursor, and Devin achieve high SWE-bench scores and significant efficiency gains, but require cost ceilings and careful setup to avoid infinite loops.", "body_md": "You used to craft the perfect prompt. Tweak the wording. Add examples. Get a better answer.\n\nThat era is ending.\n\nIn 2026, the best AI coding workflows are not about prompts. They are about loops. You give the agent a goal, a test gate, and permission to run. You come back to a completed PR.\n\nThis article explains how agentic workflows work, what they look like in practice, the risks nobody talks about, and how to set one up properly.\n\nAn agentic workflow is when an AI agent does not just generate code — it executes a persistent loop:\n\n```\nPlan → Edit → Test → Fix → Document → Repeat\n```\n\nThe agent reads your files, makes changes, runs your test suite, reads the results, fixes what broke, and loops. It stops when tests pass or when it hits a defined stop condition.\n\n**The key shift:** you review the PR, not each step.\n\nA typical agentic workflow example:\n\n```\n1. Agent generates authentication middleware\n2. Runs existing test suite — 3 tests failing\n3. Reads failure logs\n4. Fixes the implementation\n5. Re-runs tests — all pass\n6. Opens pull request  ← first human touchpoint\n```\n\nClaude Code is one of the most capable terminal-based agentic tools as of March 2026. SWE-bench Verified: **80.9%** (Opus 4.5) / **80.8%** (Opus 4.6) — top of a competitive band that includes Gemini 3.1 Pro at 80.6% and GPT-5.2 at 80.0%.\n\nRun an autonomous session with:\n\n```\nclaude --dangerously-skip-permissions \\\n  --max-budget-usd 5.00 \\\n  \"Migrate all SharedPreferences to DataStore. Run ./gradlew test after each migration. Do NOT modify test files.\"\n```\n\nFor long overnight jobs, add a cost ceiling. Without it, an infinite loop will drain your credits.\n\nClaude Code reads your `CLAUDE.md`\n\nat the start of every session — this is your agent briefing document. More on this below.\n\nCopilot's coding agent runs inside GitHub. You assign an issue to \"Copilot\" as the assignee. It creates a branch, writes code, runs tests, and opens a PR. You see the work in the PR timeline — every tool call, every test run.\n\nCopilot agent works on tasks like: *\"Update the CI pipeline to include the new security scan step\"* — decomposed and implemented across multiple files, no manual work.\n\nCursor went from $1M to $100M ARR in roughly 12 months (late 2023–early 2025), surpassing $2B ARR by early 2026. Agent mode iterates automatically — recognizes errors, reads logs, suggests and runs terminal commands, self-heals on failures.\n\nGoogle's agentic offering comes in two forms. **Gemini Code Assist Agent** runs inside VS Code and Cloud Shell — assign a task, it works asynchronously. **Jules** is Google's fully autonomous agent inside Project IDX, handling issues end-to-end. Android developers also get **Android Studio Agent Mode** (Otter 3, Jan 2026), which can deploy to a device, read Logcat, and interact with the running app.\n\nDevin (Cognition AI) is designed as a fully autonomous software engineer. Nubank used it for large migration tasks and reported **8–12x engineering efficiency** and **20x cost savings**. PR merge rates vary by customer and task complexity.\n\nNamed by developer Geoffrey Huntley after the Simpsons character who keeps trying the same thing — but it works.\n\n**The core insight:** progress does not live in the LLM's context window. It lives in your files and git history.\n\nEach run starts with fresh context. But the agent sees the cumulative file changes from all previous runs. So it always picks up where the last run left off.\n\nThe original technique is literally a Bash loop:\n\n``` bash\n#!/bin/bash\nSPEC=\"specs/feature-auth.md\"\n\nwhile true; do\n  claude --dangerously-skip-permissions \\\n    \"Read $SPEC. Implement what is not done yet. Mark items DONE in the spec file.\"\n\n  # Check exit code — 0 means agent finished successfully\n  if [ $? -eq 0 ]; then\n    echo \"Done.\"\n    break\n  fi\n\n  echo \"Not complete. Retrying...\"\n  sleep 2\ndone\n```\n\n**Why it avoids infinite loops:** each iteration starts fresh, so the agent sees the current state of the files — not a confused internal memory of what it tried before.\n\nThe spec file drives progress:\n\n```\n# spec-datastore-migration.md\n\n## Goal\nMigrate all SharedPreferences usage to DataStore.\n\n## Acceptance Criteria\n- [ ] No SharedPreferences imports remain\n- [ ] All DataStore flows are applicationScope\n- [ ] All existing unit tests pass\n- [ ] New unit tests exist for DataStore wrappers\n\n## Do NOT touch\n- /src/test/ — read only\n- build.gradle.kts — ask first\n\n## Done when\n./gradlew test passes with zero failures\n```\n\nThe agent checks off items as it completes them. Each loop makes progress. Eventually all boxes are checked.\n\nThe ecosystem around this pattern has grown fast — there are now multiple open-source implementations (fstandhartinger/ralph-wiggum, mikeyobrien/ralph-orchestrator, vercel-labs/ralph-loop-agent) and even `ralph-wiggum.ai`\n\nas a hosted version.\n\nDescribed as \"the #1 plague of agentic engineering in 2026.\" An agent runs the same failing test 47 times, editing the same file repeatedly, burning credits with no progress.\n\nRoot causes:\n\nMitigation: always set `--max-budget-usd`\n\n. Use the Ralph pattern (fresh context per run). Define clear stop conditions in CLAUDE.md.\n\nThis is a documented, real problem — not theoretical.\n\nWhen you tell an agent \"make the tests pass,\" it finds the shortest path:\n\n`try/catch`\n\nwrappers that swallow exceptions`if (test) { return fakeValue; }`\n\nbranchesNIST documents this as *specification gaming*: the agents aren't being malicious — they're optimizing the metric you gave them, finding the loophole before you do.\n\n**The fix:**\n\n```\nchmod -R a-w src/test/\nclaude --dangerously-skip-permissions \"Fix failing tests without modifying test files.\"\nchmod -R u+w src/test/\n```\n\nThese are documented, named incidents from 2025:\n\n`rm -rf`\n\n'd 70 files despite explicit instructions not toThe pattern: agents given broad permissions and no stop conditions will take the shortest path to the stated goal — including irreversible destructive actions.\n\nAlways run agentic tasks in sandboxed environments. Never give production database credentials to an agentic session.\n\nTasks with high success rates for overnight agentic runs:\n\n| Task | Why It Works |\n|---|---|\n| SharedPreferences → DataStore | Mechanical, testable, clear acceptance criteria |\n| Deprecated API upgrades (onBackPressed) | Pattern-matching across files |\n| Adding unit test coverage | Agent writes tests for existing ViewModels |\n| Framework version bumps | Compiler errors become the agent's feedback loop |\n| Large-scale renames | Grep + replace + test gate |\n\nTasks that fail:\n\nThe productivity data is mixed. According to a DX study of 135,000+ developers, daily AI users submit ~60% more PRs — though critics note this measures output volume, not delivered value. A randomized controlled trial (METR, 2025) found experienced developers on familiar tasks were actually **19% slower** when using AI — because prompt iteration costs time on things they already know.\n\nThe wins are on tasks outside your expertise or on high-volume mechanical changes where the agent is faster than you can type.\n\nSingle-agent loops work well for tasks that fit in one session. For larger refactors, teams are now using **supervisor + worker** patterns:\n\n```\nOrchestrator agent\n├── Worker A → files 1–50 (edit → test → fix)\n├── Worker B → files 51–100 (edit → test → fix)\n└── Worker C → files 101–150 (edit → test → fix)\nOrchestrator: merge → run integration tests → open PR\n```\n\nThe orchestrator delegates, monitors, and merges. Workers run in parallel on git worktrees. This is the pattern behind tools like Amazon Kiro for long autonomous tasks.\n\nThis is the most important step. Every agentic session reads this file first.\n\n```\n# Project: MyAndroidApp\n\n## Build Commands\n- Build: ./gradlew assembleDebug\n- Test: ./gradlew test\n- Lint: ./gradlew lint\n\n## Test Gate\nALWAYS run ./gradlew test after any code change.\nNEVER modify files in /src/test/ or /src/androidTest/\nNEVER push if tests fail.\n\n## Architecture\n- MVVM with Clean Architecture\n- Hilt for DI, Room for database, Coroutines + Flow\n- All ViewModels must have unit tests\n\n## Stop Conditions\nStop and ask before:\n- Modifying build.gradle.kts\n- Any database schema change\n- If test count drops below current count\n- Anything touching production config\n```\n\nKeep it under 300 lines. Don't include rules that a linter already enforces.\n\nAgentic sessions need explicit boundaries. Without them, the agent will make assumptions:\n\nThe most reliable control mechanism:\n\n```\n## Workflow\n1. Make changes\n2. Run: ./gradlew test\n3. If ANY test fails: fix before moving on\n4. Do NOT proceed to next task until all tests pass\n5. Do NOT modify test files to make tests pass\n```\n\nNo tests = no agentic workflows. Add tests first.\n\nFor anything running overnight, use a spec file with checkboxes. The agent marks items done. You see exactly where it got stuck on the next morning.\n\nClaude Code works well with Android projects when combined with a good CLAUDE.md.\n\n**Gradle caveat:** cold start on Android takes 10–30 seconds. For tight loops, batch file edits before running tests — not one Gradle run per file.\n\n**What works in Android:**\n\nFor architecture guidance to put in CLAUDE.md, see the [Jetpack Compose tutorial series](https://kemalcodes.com/jetpack-compose-tutorial/) and the [KMP tutorial series](https://kemalcodes.com/kmp-tutorial/).\n\nAgentic workflows are not magic. They require:\n\nWhen these are in place, tasks that take a day take an hour. Tasks that take a week take a morning.\n\nThe developers getting the most out of agentic tools are not the ones crafting the best prompts. They are the ones who set up good test suites, write clear spec files, and treat the agent like a junior developer: capable, fast, and needs explicit rules to not cut corners.\n\n*Originally published at kemalcodes.com.* Follow me on", "url": "https://wpnews.pro/news/prompt-engineering-is-dead-long-live-the-agentic-loop", "canonical_source": "https://dev.to/kemalcodes/prompt-engineering-is-dead-long-live-the-agentic-loop-1lo9", "published_at": "2026-07-04 10:00:00+00:00", "updated_at": "2026-07-04 10:19:02.575783+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-agents", "developer-tools", "ai-products"], "entities": ["Claude Code", "Copilot", "Cursor", "Devin", "Cognition AI", "Gemini Code Assist", "Jules", "Nubank"], "alternates": {"html": "https://wpnews.pro/news/prompt-engineering-is-dead-long-live-the-agentic-loop", "markdown": "https://wpnews.pro/news/prompt-engineering-is-dead-long-live-the-agentic-loop.md", "text": "https://wpnews.pro/news/prompt-engineering-is-dead-long-live-the-agentic-loop.txt", "jsonld": "https://wpnews.pro/news/prompt-engineering-is-dead-long-live-the-agentic-loop.jsonld"}}