Five tool-calling patterns that separate hobby AI agents from production ones

A developer outlines five tool-calling patterns that distinguish hobby AI agents from production-ready systems, including hard tool call budgets, deduplication, error handling, and safety checks. The patterns address common failures like infinite loops, repeated calls, and confabulated responses, with code examples using the Anthropic SDK.

Almost every "build an AI agent" tutorial ends the same way: the model calls a tool, the tool returns data, the model uses the data to respond. It works in the demo. What the tutorial doesn't show: what happens when the tool times out. Or when the model calls the same tool three times in a row. Or when the model calls a destructive tool without the user intending it. Or when a tool returns an error and the model confabulates a response anyway. These aren't edge cases — they're the normal operating conditions of a production agent. Here are five patterns I use on every agent I ship to handle them. By default, most agent frameworks will let the model call tools indefinitely until it decides to stop and respond. This is fine in demos. In production, it means a single misbehaving agent can loop through dozens of API calls and rack up costs before anyone notices. The fix is a hard tool call budget per turn. python import Anthropic from "@anthropic-ai/sdk"; const client = new Anthropic ; async function runAgentWithBudget messages: Anthropic.MessageParam , tools: Anthropic.Tool , maxToolCalls = 5 : Promise<{ content: string; toolCallCount: number; hitBudget: boolean } { let toolCallCount = 0; let currentMessages = ...messages ; while true { const response = await client.messages.create { model: "claude-sonnet-4-5", max tokens: 2048, tools, messages: currentMessages, } ; // Model is done calling tools if response.stop reason === "end turn" { const text = response.content .filter b : b is Anthropic.TextBlock = b.type === "text" .map b = b.text .join "" ; return { content: text, toolCallCount, hitBudget: false }; } // Model wants to use tools if response.stop reason === "tool use" { const toolUseBlocks = response.content.filter b : b is Anthropic.ToolUseBlock = b.type === "tool use" ; toolCallCount += toolUseBlocks.length; // Budget exceeded — stop and tell the model if toolCallCount maxToolCalls { const budgetMessage: Anthropic.MessageParam = { role: "user", content: { type: "tool result", tool use id: toolUseBlocks 0 .id, content: "Tool call budget exceeded. Please respond with what you know so far.", is error: true, } , }; // One final completion without tools const finalResponse = await client.messages.create { model: "claude-sonnet-4-5", max tokens: 1024, messages: ...currentMessages, { role: "assistant", content: response.content }, budgetMessage , } ; const text = finalResponse.content .filter b : b is Anthropic.TextBlock = b.type === "text" .map b = b.text .join "" ; return { content: text, toolCallCount, hitBudget: true }; } // Execute the tools and continue const toolResults = await Promise.all toolUseBlocks.map async block = { type: "tool result" as const, tool use id: block.id, content: await executeToolSafely block.name, block.input , } ; currentMessages = ...currentMessages, { role: "assistant", content: response.content }, { role: "user", content: toolResults }, ; } } } The maxToolCalls = 5 default is conservative. Adjust based on what your agent actually does. For a simple lookup agent, 3 is plenty. For a research agent doing multi-step synthesis, 10-15 might be appropriate. The point is to have a limit at all. A common agent failure mode: the model calls the same tool with the same arguments multiple times in one turn or across turns . This is wasteful at best and dangerous at worst — imagine calling send email twice with the same content. class ToolCallDeduplicator { private seen = new Map<string, unknown ; private readonly ttlMs: number; constructor ttlMs = 60 000 { this.ttlMs = ttlMs; } private makeKey toolName: string, input: unknown : string { return ${toolName}:${JSON.stringify input } ; } async callOnce<T toolName: string, input: unknown, fn: = Promise<T : Promise<{ result: T; wasCached: boolean } { const key = this.makeKey toolName, input ; if this.seen.has key { return { result: this.seen.get key as T, wasCached: true }; } const result = await fn ; this.seen.set key, result ; // Expire cache entries setTimeout = this.seen.delete key , this.ttlMs ; return { result, wasCached: false }; } } // Usage in the tool executor const deduplicator = new ToolCallDeduplicator ; async function executeToolSafely toolName: string, input: unknown : Promise<string { const { result, wasCached } = await deduplicator.callOnce toolName, input, = dispatchTool toolName, input ; if wasCached { console.log dedup Tool ${toolName} returned cached result ; } return typeof result === "string" ? result : JSON.stringify result ; } For idempotent read operations search, lookup , caching the result is safe and saves money. For write operations send email, create record, call webhook , you may want to reject duplicates with an error instead of silently returning the cached result — make that distinction explicit in your tool definitions. When a tool fails, the worst thing you can do is hide the error from the model. Here's a common anti-pattern: // Bad: swallowing errors async function executeToolBad name: string, input: unknown : Promise<string { try { return await dispatchTool name, input ; } catch { return ""; // model gets an empty result and often makes something up } } The model receives an empty string and has no idea the tool failed. It often confabulates a plausible-sounding response based on what it expected the tool to return. This is the source of hallucinated data in agents — not the model's training, but the agent framework hiding failures. // Good: structured error propagation async function executeToolGood name: string, input: unknown : Promise<string { try { const result = await dispatchTool name, input ; return typeof result === "string" ? result : JSON.stringify result ; } catch err { const message = err instanceof Error ? err.message : "Unknown error"; // Return a structured error string that the model can reason about return JSON.stringify { error: true, tool: name, message, suggestion: getErrorSuggestion name, err , } ; } } function getErrorSuggestion toolName: string, err: unknown : string { const msg = err instanceof Error ? err.message : ""; if msg.includes "timeout" return "The service is slow. Consider asking the user to try again."; if msg.includes "not found" return "The requested resource doesn't exist. Confirm the identifier is correct."; if msg.includes "rate limit" return "Rate limited. Wait a moment and retry."; return "An unexpected error occurred. Inform the user and offer alternatives."; } With structured error responses, the model can reason about what went wrong and suggest a recovery path to the user, rather than making up a false answer. Agents that have both read tools search, lookup, read file and write tools send email, create record, delete, call API need different safety profiles for each category. The model should be able to call read tools freely but should be more cautious — and optionally ask for confirmation — before calling write tools. js const READ TOOLS = new Set "search", "lookup user", "get document", "read calendar" ; const WRITE TOOLS = new Set "send email", "create record", "delete file", "call webhook" ; const DESTRUCTIVE TOOLS = new Set "delete file", "cancel subscription" ; interface ToolCallDecision { allowed: boolean; requiresConfirmation: boolean; reason?: string; } function classifyToolCall toolName: string, context: { userConfirmedWrite: boolean; sessionTrusted: boolean } : ToolCallDecision { if READ TOOLS.has toolName { return { allowed: true, requiresConfirmation: false }; } if DESTRUCTIVE TOOLS.has toolName { if context.userConfirmedWrite { return { allowed: false, requiresConfirmation: true, reason: ${toolName} is irreversible. Explicit user confirmation required. , }; } return { allowed: true, requiresConfirmation: false }; } if WRITE TOOLS.has toolName { if context.sessionTrusted && context.userConfirmedWrite { return { allowed: true, requiresConfirmation: false }; } return { allowed: false, requiresConfirmation: true, reason: ${toolName} will make changes. Confirm with user first. , }; } // Unknown tool — default deny return { allowed: false, requiresConfirmation: false, reason: Unknown tool: ${toolName}. Not in allow-list. , }; } The key decision point: when the classification returns requiresConfirmation: true , instead of calling the tool, you return the model's proposed action to the user interface and ask for explicit approval before continuing. The agent pauses at write boundaries. Tool schemas define what you expect. The model doesn't always deliver exactly that. Even with strict JSON schemas, you'll see: strings where you specified enums, numbers as strings, arrays with a single element instead of an element directly, missing optional fields, extra fields the model invented. A coercion layer at the tool boundary handles these predictable mismatches without failing: js import { z } from "zod"; const SearchInputSchema = z.object { query: z.string .min 1 , max results: z.coerce.number .int .min 1 .max 50 .default 10 , // Model sometimes sends "true"/"false" strings for booleans include archived: z.preprocess val = val === "true" ? true : val === "false" ? false : val, z.boolean .default false , // Model sometimes sends a single string instead of array filters: z.preprocess val = typeof val === "string" ? val : val, z.array z.string .default , } ; async function handleSearchTool rawInput: unknown : Promise<string { const parseResult = SearchInputSchema.safeParse rawInput ; if parseResult.success { const errors = parseResult.error.errors.map e = ${e.path.join "." }: ${e.message} .join ", " ; return JSON.stringify { error: true, message: Invalid search parameters: ${errors} , suggestion: "Correct the parameters and try again.", } ; } const { query, max results, include archived, filters } = parseResult.data; return await performSearch query, { max results, include archived, filters } ; } z.coerce and z.preprocess do the work of handling the common mismatches string-to-number, string-to-boolean, string-to-array . The schema defines the contract; the coercion layer handles realistic model output. These five patterns aren't independent — they compose: Together they form a tool executor that is predictable, cost-controlled, and safe to run unsupervised. Without them, you have a demo. With them, you have an agent you can actually deploy. The production version of this in Python or TypeScript is about 200 lines. The demo version is 30 lines. That gap is where most AI agent projects live. The free Reliable Agent Field Guide has full implementations of these patterns plus testing strategies: penloomstudio.com/field-guide.html https://penloomstudio.com/field-guide.html