# How to Fix Tool-Use Loops in Autonomous Coding Agents

> Source: <https://dev.to/alanwest/how-to-fix-tool-use-loops-in-autonomous-coding-agents-540e>
> Published: 2026-05-26 01:38:16+00:00

Last month I was helping a friend debug their autonomous coding agent. It had been "working" on a task for 47 minutes, burned through roughly twelve bucks in API costs, and somehow ended up exactly where it started. The logs showed it had called `read_file`

on the same five files 23 times.

If you've built or experimented with AI coding agents, you've probably seen something like this. It's not a fun bug to debug — the agent isn't crashing, it isn't erroring, it just... never finishes.

Tool-use loops are the most expensive failure mode in agent design. From the outside, the agent looks busy. It's reading files, calling tools, generating thoughts, producing output. But it's not making progress toward the goal.

The shape is almost always the same:

I've now seen this in three different agent setups across two side projects and one client engagement. The symptoms are identical every time.

The fundamental issue is that the agent's working state looks nearly identical at step N and step N+5. Same task description in the system prompt, same files implicitly available, same general feel of the conversation. So the model — given essentially the same inputs — makes essentially the same decision.

There are three concrete causes worth separating:

`read_file("config.yaml")`

four times, but each turn the model mostly "sees" the latest tool result, not the pattern of what it's already tried.Let's walk through fixing each one.

Don't rely on the conversation history to encode what's been tried. Build a structured log the model can actually reason about.

``` python
from collections import Counter
from dataclasses import dataclass, field
from typing import Any

@dataclass
class ToolCallLog:
    # Counts repeated (tool_name, args) pairs so we can detect loops
    calls: Counter = field(default_factory=Counter)
    history: list = field(default_factory=list)

    def record(self, name: str, args: dict[str, Any], result: str):
        key = (name, _hash_args(args))  # stable hash of args
        self.calls[key] += 1
        self.history.append({"name": name, "args": args, "result_preview": result[:200]})

    def summary_for_model(self) -> str:
        # Surface repeated calls so the model SEES the loop forming
        repeated = [(k, n) for k, n in self.calls.items() if n > 1]
        if not repeated:
            return "No repeated tool calls so far."
        lines = [f"- {name}{args} called {n}x" for (name, args), n in repeated]
        return "Repeated calls detected:\n" + "\n".join(lines)
```

Then inject `log.summary_for_model()`

into the system prompt every turn. Suddenly the model can see that it's about to call `read_file("config.yaml")`

for the fifth time, and most modern models will course-correct on their own.

Don't trust the model to always notice. Add a circuit breaker:

``` python
MAX_IDENTICAL_CALLS = 3
MAX_TOTAL_STEPS = 40

def should_force_reflection(log: ToolCallLog) -> str | None:
    # Return a reflection prompt if we detect a loop, else None
    for key, count in log.calls.items():
        if count >= MAX_IDENTICAL_CALLS:
            name, args = key
            return (
                f"You've called {name} with the same args {count} times. "
                "This is a loop. Stop and explain in one sentence what you "
                "actually need, then choose a different strategy."
            )
    if len(log.history) >= MAX_TOTAL_STEPS:
        return (
            "You've taken many steps without finishing. Summarize what you "
            "know, what you still need, and propose a single next action."
        )
    return None
```

When this triggers, inject the returned string as a user message before the next model call. I've found this single change cuts wasted tokens by something like half on the workflows I've tested. Your mileage will vary, but the direction is consistent.

Even without a detected loop, models drift on long tasks. A periodic forced reflection helps. The cadence I've landed on is every 8–10 tool calls:

``` php
REFLECTION_INTERVAL = 8

def maybe_reflect(step: int, task: str) -> str | None:
    if step > 0 and step % REFLECTION_INTERVAL == 0:
        return (
            f"Pause. Original task: {task}\n"
            "In 3 short bullets, answer:\n"
            "1. What have I actually accomplished?\n"
            "2. What is still blocking completion?\n"
            "3. Is my current approach working, or should I change it?"
        )
    return None
```

This is borrowed from human pair programming — "hey, where are we?" every so often is healthy.

The last fix is the most boring but probably the most important. When a tool fails, don't soften the error message:

``` php
def format_tool_error(name: str, args: dict, exc: Exception) -> str:
    # Be specific about what failed. Generic errors invite retries.
    return (
        f"TOOL ERROR: {name} failed with {type(exc).__name__}: {exc}.\n"
        f"Inputs were: {args}.\n"
        "Do NOT retry with identical arguments. Either fix the inputs "
        "or choose a different tool."
    )
```

The "Do NOT retry with identical arguments" line sounds silly but actually moves the needle. I tested with and without it on the same task three times — without it, the agent retried failing calls about 60% of the time. With it, closer to 10%. Tiny sample size, but the effect was obvious.

A few patterns I now reach for by default when building agents:

`read_file`

results to the relevant section instead of dumping whole files. Less noise, more signal.`notes.md`

it can write to. Externalized memory is cheaper than re-deriving state from chat history.None of this is novel — the broader agent research community has been writing about reflection, planning, and memory for a while. But it's easy to skip these when you're hacking together a prototype and assume "the model will figure it out." It won't. Not reliably.

Tool-use loops are not a model problem so much as a harness problem. The model is doing exactly what you'd expect given identical inputs every turn. Your job, as the person building the loop around the model, is to make sure the inputs aren't identical — that the agent can see its own history, get nudged when it's stuck, and feel the weight of its errors.

Fix those four things and most of your runaway agent costs go away. The rest is just tuning.
