{"slug": "how-to-build-an-agentic-loop-with-claude-code-verification-cost-and-stopping", "title": "How to Build an Agentic Loop with Claude Code: Verification, Cost, and Stopping Criteria", "summary": "Anthropic released a guide on building production-safe agentic loops with Claude Code, detailing how to prevent runaway token costs and uncontrolled terminal processes. The guide explains that agentic loops require explicit verification checkpoints, spending limits, and stopping criteria to avoid failures like $400 API bills from loops running hundreds of iterations. It advises developers to define concrete, observable, binary, and bounded success conditions before starting any autonomous task.", "body_md": "# How to Build an Agentic Loop with Claude Code: Verification, Cost, and Stopping Criteria\n\nLearn how to design agentic loops in Claude Code with proper verification steps, spending limits, and stopping conditions to avoid runaway token costs.\n\n## What an Agentic Loop Actually Is (and Why It Can Go Wrong)\n\nBuilding an agentic loop with Claude Code sounds straightforward until your terminal starts scrolling at 3am and you wake up to a $400 API bill. That’s not a hypothetical — it’s a known failure mode, and it happens because agentic loops are fundamentally different from single-turn AI calls.\n\nIn a standard prompt, you send a message and get a response. Done. In an agentic loop, the model takes an action, observes the result, decides what to do next, and repeats. That loop can run for dozens or hundreds of iterations. Without proper verification steps, spending limits, and stopping criteria baked into your design, you’re one edge case away from a runaway process.\n\nThis guide walks through how to build an agentic loop in Claude Code that’s actually production-safe — covering how to structure verification checkpoints, cap your token spend, and define clear exit conditions before you run a single iteration.\n\n## Understanding the Loop Architecture in Claude Code\n\nClaude Code is Anthropic’s agentic coding tool that runs in your terminal. It can read files, write code, execute shell commands, run tests, and iterate on its own output. That makes it genuinely powerful for autonomous tasks — but it also means the model has real tools with real consequences.\n\n### The Basic Loop Structure\n\nAt a high level, every agentic loop has three phases:\n\n**Plan**— The agent assesses the task and decides what to do next** Act**— The agent takes an action (writes code, runs a command, reads a file)** Observe**— The agent reads the result of that action and updates its plan\n\nThis continues until either the task is complete or something stops the loop.\n\nThe problem is that “something stops the loop” rarely happens automatically. Without explicit stopping criteria, Claude Code will keep planning and acting as long as the task feels unfinished — or until it hits the API’s hard limits.\n\n### Why Loops Spin Out\n\nCommon reasons agentic loops run longer than expected:\n\n**Ambiguous success conditions.** If the agent can’t verify whether the task is done, it keeps trying approaches.**Cascading failures.** A failed test causes a code fix, which breaks another test, which causes another fix, and so on.**Overly broad task scope.**“Refactor the whole codebase” has no natural stopping point.** Missing error budgets.**The agent has no sense of when it should stop and ask for help rather than keep retrying.\n\nEach of these is solvable with deliberate design choices before you start the loop.\n\n## Define the Task Scope Before You Start\n\nThe single most effective cost control is a well-scoped task. This sounds obvious, but most agentic loop problems trace back to vague or open-ended instructions.\n\n### Write a Concrete Success Condition\n\nBefore invoking Claude Code, write down exactly what “done” looks like. Not “fix the bugs” but “all tests in `/tests/unit/`\n\npass with exit code 0 and no new files are created outside the `/src/`\n\ndirectory.”\n\nA good success condition has three properties:\n\n- It’s\n**observable**— you can check it programmatically - It’s\n**binary**— pass or fail, not “mostly done” - It’s\n**bounded**— it describes a finite outcome, not an ongoing process\n\nIf you can’t write a success condition this way, the task isn’t scoped tightly enough for autonomous execution.\n\n### Decompose Large Tasks\n\nLong-running loops are harder to control than short focused ones. Instead of sending Claude Code a multi-step task in one shot, break it into discrete subtasks. Run a loop for each subtask, verify the output, and only proceed to the next one if the previous passed.\n\nThis “checkpoint and continue” pattern means a failure in step 3 doesn’t cost you all the tokens from steps 4 through 10.\n\n## Build Verification Steps Into Every Loop\n\nVerification is what separates a controlled agentic loop from a black box. You need to know, at each iteration, whether progress is being made and whether the agent is heading in the right direction.\n\n### Types of Verification\n\nThere are three levels of verification worth building in:\n\n**1. Action verification**\nAfter each individual action (file write, command run), check whether the action succeeded. Claude Code will often observe the stdout/stderr output automatically, but you can add explicit checks — like asserting that a file exists after the agent claims to have written it, or that a command returned exit code 0.\n\n**2. Iteration verification**\nAt the end of each loop cycle, run a lightweight check to see if the overall task is getting closer to completion. For a code task, this might be running a specific test file. For a data task, it might be checking a row count or schema validity.\n\n**3. Terminal verification**\nBefore accepting that the loop is complete, run the full success condition you defined upfront. This is your final gate.\n\n### Use a Verification Script, Not Just the Agent’s Own Assessment\n\nOne common mistake is asking the agent to verify its own work using judgment alone. The model might say “I believe the task is complete” when it isn’t — not because it’s lying, but because it genuinely can’t see the issue.\n\nWhere possible, use an external script or test suite that the agent can invoke. The agent reports what the script says, not what it thinks. This keeps verification grounded in objective output.\n\nA simple example: if Claude Code is writing a Python function, include this in your system prompt:\n\n```\nAfter every code change, run `python -m pytest tests/test_function.py -v` and report the exact output. Do not proceed to the next step until all tests pass.\n```\n\nThe agent is now required to ground its assessment in test output rather than self-evaluation.\n\n### Limit Retry Depth\n\nIf an action fails, the agent should retry — but not indefinitely. Set a maximum retry count per action, typically 2–3 attempts. If the agent can’t succeed after 3 tries, it should stop and surface the failure rather than keep iterating.\n\nThis is especially important for shell commands and API calls where repeated failures may indicate a systemic problem (wrong environment, bad credentials, incorrect logic) that more retries won’t fix.\n\n## Set Spending Limits and Token Budgets\n\nToken cost is the most quantifiable risk in an agentic loop. The good news is it’s also the most controllable.\n\n### Estimate Cost Before You Run\n\nClaude’s API pricing is public and predictable. Before running a loop, estimate your expected cost range:\n\n- Estimate the number of iterations your task should require\n- Multiply by your expected tokens per iteration (input + output)\n- Apply the current rate for your chosen model\n\nFor Claude Sonnet, for example, you’re looking at roughly $3 per million input tokens and $15 per million output tokens as of mid-2025. A loop with 20 iterations averaging 5,000 input and 1,000 output tokens per iteration would cost approximately $3.30. That’s fine. A loop that runs 500 iterations because the stopping criteria were unclear could cost $80+.\n\nDoing this math upfront sets a baseline. If actual costs deviate significantly, something went wrong in the loop design.\n\n### Set Hard Token Limits in Your Prompts\n\nClaude Code supports a `--max-turns`\n\nflag that limits the number of agentic turns the session will run. Use it. This is the simplest hard stop available:\n\n```\nclaude --max-turns 25 \"Fix all failing tests in /tests/unit/\"\n```\n\nSetting this to a number that’s somewhat higher than your expected turn count gives the agent room to work while preventing runaway execution.\n\n### Use Model Tiers Strategically\n\nNot every step in a loop needs the most capable (and expensive) model. Claude Haiku costs significantly less than Claude Sonnet, and for routine actions like reading a file, checking a diff, or running a known command, a smaller model is often sufficient.\n\nConsider using a tiered approach:\n\n- Lighter model for planning and observation steps\n- Stronger model only for the core generation or reasoning steps\n\n##\nPlans first.\n*Then code.*\n\nRemy writes the spec, manages the build, and ships the app.\n\nThis can reduce per-iteration costs by 60–80% without meaningfully affecting output quality on simpler actions.\n\n### Monitor Spend in Real Time\n\nFor loops running in production or automated pipelines, connect cost tracking to your loop controller. Most teams do this by:\n\n- Counting tokens via the API response metadata\n- Accumulating a running total\n- Stopping the loop if total spend exceeds a threshold\n\nThis is distinct from the `--max-turns`\n\nflag — cost-based stopping handles cases where each turn is unexpectedly expensive, not just cases where there are too many turns.\n\n## Define Clear Stopping Criteria\n\nStopping criteria are the exit conditions your loop checks at the end of each iteration. They should be defined before the loop starts, not improvised mid-run.\n\n### Three Categories of Stopping Criteria\n\n**Success stopping**\nThe task is complete. Your terminal verification passed. The agent returns a success signal and the loop exits cleanly.\n\n**Failure stopping**\nThe task cannot be completed in its current form. This triggers when:\n\n- Retry count for a specific action exceeds the limit\n- An unrecoverable error is encountered (missing file, broken environment)\n- The agent explicitly signals it’s stuck\n\n**Budget stopping**\nResources are exhausted. This triggers when:\n\n- Turn count exceeds\n`--max-turns`\n\n- Token spend exceeds the cost threshold\n- Wall-clock time exceeds a time limit\n\nEvery loop should have at least one criteria from each category. Leaving out failure stopping means the loop can keep thrashing on a broken state. Leaving out budget stopping means a slow failure becomes an expensive one.\n\n### Write Stopping Criteria Into the System Prompt\n\nDon’t rely solely on code-level controls. Make the stopping criteria explicit in the instructions you give Claude Code:\n\n```\nYou have a maximum of 20 attempts to complete this task. \nIf all tests pass, report \"TASK_COMPLETE\" and stop.\nIf you encounter an error you cannot resolve after 3 retries, report \"TASK_FAILED: [reason]\" and stop.\nDo not create new files outside of /src/ or /tests/.\n```\n\nExplicit instructions give the model the context it needs to recognize when it should stop trying versus when it should escalate.\n\n### Handle Partial Completion\n\nSome tasks can be 80% done when the loop hits a stopping condition. Decide upfront how to handle this:\n\n- Should the agent commit partial work?\n- Should it roll back?\n- Should it leave a detailed summary of what’s done and what’s blocked?\n\nThe worst outcome is a loop that stops mid-way and leaves the codebase in an inconsistent state with no clear record of what happened. Build in a cleanup and summary step that runs regardless of whether the loop exits via success or failure.\n\n## Practical Loop Design Patterns\n\nOnce you understand the core components, a few patterns emerge that work well for most agentic loop use cases with Claude Code.\n\n### The Test-Driven Loop\n\nThis is the cleanest pattern for code tasks:\n\n- Write or provide tests before running the agent\n- The agent writes or modifies code\n- The agent runs the tests after each change\n- Loop exits when tests pass or retry limit is hit\n\n## Remy doesn't build the plumbing. It inherits it.\n\nOther agents wire up auth, databases, models, and integrations from scratch every time you ask them to build something.\n\nRemy ships with all of it from MindStudio — so every cycle goes into the app you actually want.\n\nThe test suite acts as both the verification mechanism and the success condition. The agent never has to judge its own output — the tests do it.\n\n### The Diff-Review Loop\n\nFor higher-stakes changes, add a human-in-the-loop checkpoint:\n\n- Agent proposes a change (generates a diff)\n- Human reviews and approves or rejects\n- On approval, agent applies the change and verifies\n- Loop continues to next task\n\nThis slows things down but keeps a human in the decision path for consequential actions. It’s appropriate for production code, database migrations, or any task where a mistake is costly to reverse.\n\n### The Checkpoint-and-Summarize Loop\n\nFor longer tasks, periodically checkpoint progress:\n\n- Agent completes a subtask\n- Agent writes a structured summary of what was done and what’s next\n- Controller saves this summary to a file\n- If the loop is interrupted, it can resume from the last checkpoint\n\nThis makes long loops recoverable and gives you an audit trail of what the agent did and why.\n\n## How MindStudio Fits Into Agentic Loop Design\n\nIf you’re building agentic loops for business processes — not just local code tasks — the infrastructure overhead adds up fast. You need rate limiting, retries, auth management, cost tracking, and observability, all in addition to the actual task logic.\n\nThis is exactly what MindStudio’s [Agent Skills Plugin](https://mindstudio.ai) is designed to handle. It’s an npm SDK (`@mindstudio-ai/agent`\n\n) that gives agents like Claude Code access to over 120 typed capabilities — things like `agent.sendEmail()`\n\n, `agent.searchGoogle()`\n\n, `agent.runWorkflow()`\n\n— without you having to build the infrastructure layer yourself.\n\nInstead of writing custom code to manage retries, rate limits, and authentication for every external service your agent needs to call, the Agent Skills Plugin handles it. Your loop logic stays focused on reasoning and decision-making, not plumbing.\n\nFor teams that want to go further — exposing agentic workflows as API endpoints, building scheduled background agents, or creating no-code automations that sit alongside Claude Code-driven pipelines — MindStudio’s visual builder is worth exploring. You can [start for free at mindstudio.ai](https://mindstudio.ai) and have a working agent running in under an hour.\n\n## Common Mistakes (and How to Fix Them)\n\n### Mistake 1: Starting Without a Success Condition\n\n**Fix:** Write the success condition before writing the prompt. If you can’t define it in one sentence, scope the task down.\n\n### Mistake 2: No Hard Turn Limit\n\n**Fix:** Always pass `--max-turns`\n\nwhen running Claude Code on autonomous tasks. Start conservative (15–20 turns) and increase only if you have data showing the task consistently needs more.\n\n### Mistake 3: Trusting the Agent’s Self-Assessment\n\n**Fix:** Use external tests, scripts, or validators to confirm task completion. The agent’s “I believe this is done” is a signal, not a verification.\n\n### Mistake 4: Running the Most Expensive Model for Every Step\n\n**Fix:** Audit your loop and identify steps that don’t require strong reasoning. Shift those to Claude Haiku or a comparable lightweight model.\n\n### Mistake 5: No Failure Handling\n\n**Fix:** Build explicit failure states into your instructions. Tell the agent what “stuck” looks like and what to do when it gets there.\n\n### Mistake 6: Letting the Agent Modify Its Own Stopping Criteria\n\n## Other agents ship a demo. Remy ships an app.\n\nReal backend. Real database. Real auth. Real plumbing. Remy has it all.\n\n**Fix:** Keep stopping logic in your controller code or system prompt, not in a place the agent can edit during the loop.\n\n## Frequently Asked Questions\n\n### What is an agentic loop in Claude Code?\n\nAn agentic loop is a multi-step process where Claude Code plans, takes an action, observes the result, and decides what to do next — repeating this cycle until a task is complete or a stopping condition is met. Unlike single-turn prompts, agentic loops can run dozens of iterations and execute real actions like writing files, running commands, and calling APIs.\n\n### How do I prevent Claude Code from running too many tokens?\n\nUse the `--max-turns`\n\nflag to cap the number of iterations, write explicit stopping criteria into your system prompt, and add cost tracking in your loop controller. Choosing a lighter Claude model for routine steps also reduces token spend significantly. Setting a hard spend threshold that halts the loop is the most reliable cost safeguard.\n\n### What stopping criteria should I use for a Claude Code agentic loop?\n\nYou need at least three types: a success condition (task completed, tests passed), a failure condition (unrecoverable error, retry limit hit), and a budget condition (max turns reached, cost threshold exceeded). All three should be defined before the loop starts. Success and failure criteria should be written into your prompt; budget criteria should be enforced in your controller code.\n\n### How does verification work in an agentic loop?\n\nVerification checks whether the agent is making real progress and whether the final output meets the success condition. The most reliable approach is to use external tests or scripts that the agent runs and reports on — not the agent’s own judgment. Run lightweight checks after each iteration and a full terminal check before exiting.\n\n### Can Claude Code handle partial task completion?\n\nYes, but you need to design for it. Define what the agent should do if it hits a stopping condition before the task is fully complete — whether that’s committing partial work, rolling back, or generating a summary of what was done and what’s blocked. Without explicit instructions, partial completion can leave your codebase or workflow in an inconsistent state.\n\n### How many turns should I allow for a typical agentic loop?\n\nIt depends on the task, but most focused code tasks should complete in 10–25 turns. If a loop regularly needs more than 30 turns, that’s usually a sign the task scope is too broad or the success condition is unclear. Start with a lower limit, measure actual usage, and increase only based on data.\n\n## Key Takeaways\n\n- An agentic loop needs three types of stopping criteria before you start: success, failure, and budget conditions. Missing any one of them creates real risk.\n- External verification (tests, scripts, validators) is more reliable than asking the agent to assess its own output.\n- The\n`--max-turns`\n\nflag is your simplest hard stop — always use it. - Break large tasks into scoped subtasks with checkpoints. Short loops with clear success conditions are easier to control than one long loop.\n- Token costs are predictable if you estimate upfront and monitor in real time. Most cost surprises trace back to missing stopping criteria, not model pricing.\n\n- ✕a coding agent\n- ✕no-code\n- ✕vibe coding\n- ✕a faster Cursor\n\nThe one that tells the coding agents what to build.\n\nIf you’re building more complex agentic workflows beyond local coding tasks — or want to connect your Claude Code loops to external tools without managing the infrastructure yourself — [MindStudio](https://mindstudio.ai) offers an easy way to get started.", "url": "https://wpnews.pro/news/how-to-build-an-agentic-loop-with-claude-code-verification-cost-and-stopping", "canonical_source": "https://www.mindstudio.ai/blog/how-to-build-agentic-loop-claude-code/", "published_at": "2026-06-10 00:00:00+00:00", "updated_at": "2026-06-11 19:48:24.886955+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-tools", "ai-safety", "ai-infrastructure"], "entities": ["Claude Code", "Anthropic"], "alternates": {"html": "https://wpnews.pro/news/how-to-build-an-agentic-loop-with-claude-code-verification-cost-and-stopping", "markdown": "https://wpnews.pro/news/how-to-build-an-agentic-loop-with-claude-code-verification-cost-and-stopping.md", "text": "https://wpnews.pro/news/how-to-build-an-agentic-loop-with-claude-code-verification-cost-and-stopping.txt", "jsonld": "https://wpnews.pro/news/how-to-build-an-agentic-loop-with-claude-code-verification-cost-and-stopping.jsonld"}}