How to Fix Tool-Use Loops in Autonomous Coding Agents

A developer discovered that autonomous coding agents frequently fall into expensive tool-use loops, burning through API costs without making progress toward their goals. The engineer identified three concrete causes for these loops and built a structured logging system with a circuit breaker that cuts wasted tokens by roughly half. The solution involves tracking repeated tool calls and injecting reflection prompts when the agent begins repeating itself.

Last month I was helping a friend debug their autonomous coding agent. It had been "working" on a task for 47 minutes, burned through roughly twelve bucks in API costs, and somehow ended up exactly where it started. The logs showed it had called read file on the same five files 23 times. If you've built or experimented with AI coding agents, you've probably seen something like this. It's not a fun bug to debug — the agent isn't crashing, it isn't erroring, it just... never finishes. Tool-use loops are the most expensive failure mode in agent design. From the outside, the agent looks busy. It's reading files, calling tools, generating thoughts, producing output. But it's not making progress toward the goal. The shape is almost always the same: I've now seen this in three different agent setups across two side projects and one client engagement. The symptoms are identical every time. The fundamental issue is that the agent's working state looks nearly identical at step N and step N+5. Same task description in the system prompt, same files implicitly available, same general feel of the conversation. So the model — given essentially the same inputs — makes essentially the same decision. There are three concrete causes worth separating: read file "config.yaml" four times, but each turn the model mostly "sees" the latest tool result, not the pattern of what it's already tried.Let's walk through fixing each one. Don't rely on the conversation history to encode what's been tried. Build a structured log the model can actually reason about. python from collections import Counter from dataclasses import dataclass, field from typing import Any @dataclass class ToolCallLog: Counts repeated tool name, args pairs so we can detect loops calls: Counter = field default factory=Counter history: list = field default factory=list def record self, name: str, args: dict str, Any , result: str : key = name, hash args args stable hash of args self.calls key += 1 self.history.append {"name": name, "args": args, "result preview": result :200 } def summary for model self - str: Surface repeated calls so the model SEES the loop forming repeated = k, n for k, n in self.calls.items if n 1 if not repeated: return "No repeated tool calls so far." lines = f"- {name}{args} called {n}x" for name, args , n in repeated return "Repeated calls detected:\n" + "\n".join lines Then inject log.summary for model into the system prompt every turn. Suddenly the model can see that it's about to call read file "config.yaml" for the fifth time, and most modern models will course-correct on their own. Don't trust the model to always notice. Add a circuit breaker: python MAX IDENTICAL CALLS = 3 MAX TOTAL STEPS = 40 def should force reflection log: ToolCallLog - str | None: Return a reflection prompt if we detect a loop, else None for key, count in log.calls.items : if count = MAX IDENTICAL CALLS: name, args = key return f"You've called {name} with the same args {count} times. " "This is a loop. Stop and explain in one sentence what you " "actually need, then choose a different strategy." if len log.history = MAX TOTAL STEPS: return "You've taken many steps without finishing. Summarize what you " "know, what you still need, and propose a single next action." return None When this triggers, inject the returned string as a user message before the next model call. I've found this single change cuts wasted tokens by something like half on the workflows I've tested. Your mileage will vary, but the direction is consistent. Even without a detected loop, models drift on long tasks. A periodic forced reflection helps. The cadence I've landed on is every 8–10 tool calls: php REFLECTION INTERVAL = 8 def maybe reflect step: int, task: str - str | None: if step 0 and step % REFLECTION INTERVAL == 0: return f"Pause. Original task: {task}\n" "In 3 short bullets, answer:\n" "1. What have I actually accomplished?\n" "2. What is still blocking completion?\n" "3. Is my current approach working, or should I change it?" return None This is borrowed from human pair programming — "hey, where are we?" every so often is healthy. The last fix is the most boring but probably the most important. When a tool fails, don't soften the error message: php def format tool error name: str, args: dict, exc: Exception - str: Be specific about what failed. Generic errors invite retries. return f"TOOL ERROR: {name} failed with {type exc . name }: {exc}.\n" f"Inputs were: {args}.\n" "Do NOT retry with identical arguments. Either fix the inputs " "or choose a different tool." The "Do NOT retry with identical arguments" line sounds silly but actually moves the needle. I tested with and without it on the same task three times — without it, the agent retried failing calls about 60% of the time. With it, closer to 10%. Tiny sample size, but the effect was obvious. A few patterns I now reach for by default when building agents: read file results to the relevant section instead of dumping whole files. Less noise, more signal. notes.md it can write to. Externalized memory is cheaper than re-deriving state from chat history.None of this is novel — the broader agent research community has been writing about reflection, planning, and memory for a while. But it's easy to skip these when you're hacking together a prototype and assume "the model will figure it out." It won't. Not reliably. Tool-use loops are not a model problem so much as a harness problem. The model is doing exactly what you'd expect given identical inputs every turn. Your job, as the person building the loop around the model, is to make sure the inputs aren't identical — that the agent can see its own history, get nudged when it's stuck, and feel the weight of its errors. Fix those four things and most of your runaway agent costs go away. The rest is just tuning.