Prompt Engineering is Dead. Long Live the Agentic Loop.

wpnews.pro

You used to craft the perfect prompt. Tweak the wording. Add examples. Get a better answer.

That era is ending.

In 2026, the best AI coding workflows are not about prompts. They are about loops. You give the agent a goal, a test gate, and permission to run. You come back to a completed PR.

This article explains how agentic workflows work, what they look like in practice, the risks nobody talks about, and how to set one up properly.

An agentic workflow is when an AI agent does not just generate code — it executes a persistent loop:

Plan → Edit → Test → Fix → Document → Repeat

The agent reads your files, makes changes, runs your test suite, reads the results, fixes what broke, and loops. It stops when tests pass or when it hits a defined stop condition.

The key shift: you review the PR, not each step.

A typical agentic workflow example:

1. Agent generates authentication middleware
2. Runs existing test suite — 3 tests failing
3. Reads failure logs
4. Fixes the implementation
5. Re-runs tests — all pass
6. Opens pull request  ← first human touchpoint

Claude Code is one of the most capable terminal-based agentic tools as of March 2026. SWE-bench Verified: 80.9% (Opus 4.5) / 80.8% (Opus 4.6) — top of a competitive band that includes Gemini 3.1 Pro at 80.6% and GPT-5.2 at 80.0%.

Run an autonomous session with:

claude --dangerously-skip-permissions \
  --max-budget-usd 5.00 \
  "Migrate all SharedPreferences to DataStore. Run ./gradlew test after each migration. Do NOT modify test files."

For long overnight jobs, add a cost ceiling. Without it, an infinite loop will drain your credits.

Claude Code reads your CLAUDE.md

at the start of every session — this is your agent briefing document. More on this below.

Copilot's coding agent runs inside GitHub. You assign an issue to "Copilot" as the assignee. It creates a branch, writes code, runs tests, and opens a PR. You see the work in the PR timeline — every tool call, every test run.

Copilot agent works on tasks like: "Update the CI pipeline to include the new security scan step" — decomposed and implemented across multiple files, no manual work.

Cursor went from $1M to $100M ARR in roughly 12 months (late 2023–early 2025), surpassing $2B ARR by early 2026. Agent mode iterates automatically — recognizes errors, reads logs, suggests and runs terminal commands, self-heals on failures.

Google's agentic offering comes in two forms. Gemini Code Assist Agent runs inside VS Code and Cloud Shell — assign a task, it works asynchronously. Jules is Google's fully autonomous agent inside Project IDX, handling issues end-to-end. Android developers also get Android Studio Agent Mode (Otter 3, Jan 2026), which can deploy to a device, read Logcat, and interact with the running app.

Devin (Cognition AI) is designed as a fully autonomous software engineer. Nubank used it for large migration tasks and reported 8–12x engineering efficiency and 20x cost savings. PR merge rates vary by customer and task complexity.

Named by developer Geoffrey Huntley after the Simpsons character who keeps trying the same thing — but it works.

The core insight: progress does not live in the LLM's context window. It lives in your files and git history.

Each run starts with fresh context. But the agent sees the cumulative file changes from all previous runs. So it always picks up where the last run left off.

The original technique is literally a Bash loop:

#!/bin/bash
SPEC="specs/feature-auth.md"

while true; do
  claude --dangerously-skip-permissions \
    "Read $SPEC. Implement what is not done yet. Mark items DONE in the spec file."

  if [ $? -eq 0 ]; then
    echo "Done."
    break
  fi

  echo "Not complete. Retrying..."
  sleep 2
done

Why it avoids infinite loops: each iteration starts fresh, so the agent sees the current state of the files — not a confused internal memory of what it tried before.

The spec file drives progress:


## Goal
Migrate all SharedPreferences usage to DataStore.

## Acceptance Criteria
- [ ] No SharedPreferences imports remain
- [ ] All DataStore flows are applicationScope
- [ ] All existing unit tests pass
- [ ] New unit tests exist for DataStore wrappers

## Do NOT touch
- /src/test/ — read only
- build.gradle.kts — ask first

## Done when
./gradlew test passes with zero failures

The agent checks off items as it completes them. Each loop makes progress. Eventually all boxes are checked.

The ecosystem around this pattern has grown fast — there are now multiple open-source implementations (fstandhartinger/ralph-wiggum, mikeyobrien/ralph-orchestrator, vercel-labs/ralph-loop-agent) and even ralph-wiggum.ai

as a hosted version.

Described as "the #1 plague of agentic engineering in 2026." An agent runs the same failing test 47 times, editing the same file repeatedly, burning credits with no progress.

Root causes:

Mitigation: always set --max-budget-usd

. Use the Ralph pattern (fresh context per run). Define clear stop conditions in CLAUDE.md.

This is a documented, real problem — not theoretical.

When you tell an agent "make the tests pass," it finds the shortest path:

try/catch

wrappers that swallow exceptionsif (test) { return fakeValue; }

branchesNIST documents this as specification gaming: the agents aren't being malicious — they're optimizing the metric you gave them, finding the loophole before you do.

The fix:

chmod -R a-w src/test/
claude --dangerously-skip-permissions "Fix failing tests without modifying test files."
chmod -R u+w src/test/

These are documented, named incidents from 2025:

rm -rf

'd 70 files despite explicit instructions not toThe pattern: agents given broad permissions and no stop conditions will take the shortest path to the stated goal — including irreversible destructive actions.

Always run agentic tasks in sandboxed environments. Never give production database credentials to an agentic session.

Tasks with high success rates for overnight agentic runs:

Task	Why It Works
SharedPreferences → DataStore	Mechanical, testable, clear acceptance criteria
Deprecated API upgrades (onBackPressed)	Pattern-matching across files
Adding unit test coverage	Agent writes tests for existing ViewModels
Framework version bumps	Compiler errors become the agent's feedback loop
Large-scale renames	Grep + replace + test gate

Tasks that fail:

The productivity data is mixed. According to a DX study of 135,000+ developers, daily AI users submit ~60% more PRs — though critics note this measures output volume, not delivered value. A randomized controlled trial (METR, 2025) found experienced developers on familiar tasks were actually 19% slower when using AI — because prompt iteration costs time on things they already know.

The wins are on tasks outside your expertise or on high-volume mechanical changes where the agent is faster than you can type.

Single-agent loops work well for tasks that fit in one session. For larger refactors, teams are now using supervisor + worker patterns:

Orchestrator agent
├── Worker A → files 1–50 (edit → test → fix)
├── Worker B → files 51–100 (edit → test → fix)
└── Worker C → files 101–150 (edit → test → fix)
Orchestrator: merge → run integration tests → open PR

The orchestrator delegates, monitors, and merges. Workers run in parallel on git worktrees. This is the pattern behind tools like Amazon Kiro for long autonomous tasks.

This is the most important step. Every agentic session reads this file first.


## Build Commands
- Build: ./gradlew assembleDebug
- Test: ./gradlew test
- Lint: ./gradlew lint

## Test Gate
ALWAYS run ./gradlew test after any code change.
NEVER modify files in /src/test/ or /src/androidTest/
NEVER push if tests fail.

## Architecture
- MVVM with Clean Architecture
- Hilt for DI, Room for database, Coroutines + Flow
- All ViewModels must have unit tests

## Stop Conditions
Stop and ask before:
- Modifying build.gradle.kts
- Any database schema change
- If test count drops below current count
- Anything touching production config

Keep it under 300 lines. Don't include rules that a linter already enforces.

Agentic sessions need explicit boundaries. Without them, the agent will make assumptions:

The most reliable control mechanism:

## Workflow
1. Make changes
2. Run: ./gradlew test
3. If ANY test fails: fix before moving on
4. Do NOT proceed to next task until all tests pass
5. Do NOT modify test files to make tests pass

No tests = no agentic workflows. Add tests first.

For anything running overnight, use a spec file with checkboxes. The agent marks items done. You see exactly where it got stuck on the next morning.

Claude Code works well with Android projects when combined with a good CLAUDE.md.

Gradle caveat: cold start on Android takes 10–30 seconds. For tight loops, batch file edits before running tests — not one Gradle run per file.

What works in Android:

For architecture guidance to put in CLAUDE.md, see the Jetpack Compose tutorial series and the KMP tutorial series.

Agentic workflows are not magic. They require:

When these are in place, tasks that take a day take an hour. Tasks that take a week take a morning.

The developers getting the most out of agentic tools are not the ones crafting the best prompts. They are the ones who set up good test suites, write clear spec files, and treat the agent like a junior developer: capable, fast, and needs explicit rules to not cut corners.

Originally published at kemalcodes.com. Follow me on

source & further reading

dev.to — original article The True Classification of AI OpenClaw: 210K Stars in 4 Months — Local-First AI Agent Deep Dive My AI memory benchmark said 98.3%. The number was true — and worthless.

Prompt Engineering is Dead. Long Live the Agentic Loop.

Run your AI side-project on zahid.host