Why your AI agent loops forever (and how to break the cycle)

The article explains that AI agents often get stuck in infinite loops, repeatedly calling the same tool with slightly different queries, which wastes tokens and increases costs. This is identified as an architectural problem rather than a model issue, with solutions including setting a maximum step limit, detecting exact duplicate tool calls, and tracking recent tool usage to identify when the agent is stuck. The author provides code examples for implementing these fixes, such as using a hash-based duplicate detection system and a progress tracker that monitors the last several tool calls.

The 3 AM tool-call loop from hell Last month I deployed a ReAct-style agent to handle customer support triage. By 3 AM I had an alert: one user session had burned through 47,000 tokens in a single conversation. The agent had been calling the same search knowledge base tool 73 times in a row, with slightly different queries each time, never deciding to stop. If you've built any kind of tool-using agent, you've probably seen this pattern. The agent gets stuck in a loop, either repeating the same action or oscillating between two actions. Tokens evaporate. Costs spike. Users wait forever for a response that never comes. This isn't a model problem. It's an architectural problem. And once you understand what's actually happening inside the loop, the fix is straightforward. What's actually happening inside the loop A typical agent loop looks roughly like this: python def naive agent loop user query : messages = {"role": "user", "content": user query} while True: response = llm.chat messages, tools=AVAILABLE TOOLS model decided to finalize if response.finish reason == "stop": return response.content otherwise, execute the tool call and feed the result back tool call = response.tool calls 0 result = execute tool tool call.name, tool call.args messages.append response.message messages.append {"role": "tool", "content": str result } The model generates an action, you execute it, you append the result to the context, and you ask the model what to do next. Repeat until the model says "I'm done." The failure mode lives in that "until done" condition. Three things commonly go wrong: - The model has no concept of "I've already tried this." Each iteration looks at the conversation history, but if the history shows ten failed searches, the model often interprets that as "I should search harder" rather than "this approach isn't working." - Tool errors are silent or ambiguous. When a search returns an empty list, is that "no results found" or "the tool is broken"? The model can't tell, so it tries again with a different phrasing. - The stop condition is implicit. Many implementations only stop when the model produces a final-answer message. There's nothing forcing the model to ever produce one. The fix: explicit state, hard limits, structured feedback Here's a stripped-down version of the pattern I use now. It runs in plain Python against any chat-completion API: python import hashlib import json MAX STEPS = 10 def hash action name: str, args: dict - str: canonical JSON so {a:1, b:2} and {b:2, a:1} hash the same payload = json.dumps {"name": name, "args": args}, sort keys=True return hashlib.sha256 payload.encode .hexdigest def safer agent loop user query : messages = {"role": "user", "content": user query} seen actions = set for step in range MAX STEPS : response = llm.chat messages, tools=AVAILABLE TOOLS if response.finish reason == "stop": return response.content tool call = response.tool calls 0 action id = hash action tool call.name, tool call.args if action id in seen actions: tell the model it's repeating instead of running the tool again observation = { "status": "duplicate", "message": "You already called this tool with the same args. Try something different or finalize your answer." } else: seen actions.add action id observation = execute tool tool call.name, tool call.args messages.append response.message messages.append {"role": "tool", "content": json.dumps observation } hard stop: don't raise, return whatever partial answer we can return synthesize partial answer messages Three things changed: - Hard step limit. No matter what the model decides, the loop terminates after MAX STEPS . Pick a number based on the actual task — for triage I use 8, for research workflows I sometimes go up to 20. - Action deduplication. Before executing a tool call, hash the tool, args pair and check whether we've already done it. If yes, return a synthetic observation telling the model so. - Structured error envelopes. Tools return a typed result, not raw strings. The model can see status: "no results" vs status: "error" vs status: "ok" and make a better decision. Detecting oscillation, not just repetition Exact-duplicate detection catches the obvious case. But agents are clever enough to find creative ways to loop. The next pattern I had to handle: the agent calling search "authentication errors" , then search "auth errors" , then search "login failures" — semantically the same query, syntactically different. A simple defense is to track the last N tool calls and check whether the agent is making progress: python from collections import deque class ProgressTracker: def init self, window: int = 4 : self.window = window self.recent tools = deque maxlen=window def record self, tool name: str - None: self.recent tools.append tool name def is stuck self - bool: if the last N calls all hit the same tool, we're probably looping if len self.recent tools < self.window: return False return len set self.recent tools == 1 This isn't perfect — semantic similarity via embeddings would be more robust — but it catches roughly 80% of the oscillation cases I've seen in production without the complexity of a separate similarity model. Why frameworks don't solve this for you I've worked with several popular agent frameworks. Most of them give you a max iterations parameter and call it a day. That's the floor of what you need, not the ceiling. If you're building anything beyond a demo, you need: - Per-tool quotas, not just global step limits - Logging that captures the full action/observation trail so you can debug after the fact - A mechanism to inject "you've already tried this" context back into the model - A graceful exit path when the limit hits — return a partial answer, not an exception There's a community-maintained list of agent learning resources on GitHub called Agent-Learning-Hub https://github.com/datawhalechina/Agent-Learning-Hub that covers a lot of these patterns at a deeper level, including pointers to academic papers on planning and reflection that helped me understand why the naive ReAct loop has these failure modes in the first place. Prevention tips that have actually saved me A few habits I've adopted after enough 3 AM alerts: - Log every action and observation, with timestamps. When something goes wrong in production, you want the full trace, not just the final state. - Set token budgets per conversation, enforced server-side. Don't trust the agent to police itself. - Write tools that return semantically useful errors. "No results for query X. Try a more general term." beats . - Test with adversarial prompts. Specifically try inputs designed to confuse the agent and verify it bails out cleanly. - Track tool-call entropy. If the variance in your tool-call distribution drops over the course of a conversation, that's a leading indicator of stuck behavior. Wrapping up Agent loops failing in production almost always come down to missing state, missing feedback, or missing limits. The model isn't broken — it's doing exactly what the prompt and the architecture told it to do. Fix the architecture and the loops go away. The hardest part is accepting that "let the model decide when to stop" isn't a strategy. You're the one writing the loop. Own the termination logic.