# Five tool-calling patterns that separate hobby AI agents from production ones

> Source: <https://dev.to/penloom_studio_829b7817d3/five-tool-calling-patterns-that-separate-hobby-ai-agents-from-production-ones-6jc>
> Published: 2026-07-01 02:20:20+00:00

Almost every "build an AI agent" tutorial ends the same way: the model calls a tool, the tool returns data, the model uses the data to respond. It works in the demo.

What the tutorial doesn't show: what happens when the tool times out. Or when the model calls the same tool three times in a row. Or when the model calls a destructive tool without the user intending it. Or when a tool returns an error and the model confabulates a response anyway.

These aren't edge cases — they're the normal operating conditions of a production agent. Here are five patterns I use on every agent I ship to handle them.

By default, most agent frameworks will let the model call tools indefinitely until it decides to stop and respond. This is fine in demos. In production, it means a single misbehaving agent can loop through dozens of API calls and rack up costs before anyone notices.

The fix is a hard tool call budget per turn.

``` python
import Anthropic from "@anthropic-ai/sdk";

const client = new Anthropic();

async function runAgentWithBudget(
  messages: Anthropic.MessageParam[],
  tools: Anthropic.Tool[],
  maxToolCalls = 5
): Promise<{ content: string; toolCallCount: number; hitBudget: boolean }> {
  let toolCallCount = 0;
  let currentMessages = [...messages];

  while (true) {
    const response = await client.messages.create({
      model: "claude-sonnet-4-5",
      max_tokens: 2048,
      tools,
      messages: currentMessages,
    });

    // Model is done calling tools
    if (response.stop_reason === "end_turn") {
      const text = response.content
        .filter((b): b is Anthropic.TextBlock => b.type === "text")
        .map(b => b.text)
        .join("");
      return { content: text, toolCallCount, hitBudget: false };
    }

    // Model wants to use tools
    if (response.stop_reason === "tool_use") {
      const toolUseBlocks = response.content.filter(
        (b): b is Anthropic.ToolUseBlock => b.type === "tool_use"
      );

      toolCallCount += toolUseBlocks.length;

      // Budget exceeded — stop and tell the model
      if (toolCallCount > maxToolCalls) {
        const budgetMessage: Anthropic.MessageParam = {
          role: "user",
          content: [{
            type: "tool_result",
            tool_use_id: toolUseBlocks[0].id,
            content: "Tool call budget exceeded. Please respond with what you know so far.",
            is_error: true,
          }],
        };

        // One final completion without tools
        const finalResponse = await client.messages.create({
          model: "claude-sonnet-4-5",
          max_tokens: 1024,
          messages: [...currentMessages, 
            { role: "assistant", content: response.content },
            budgetMessage
          ],
        });

        const text = finalResponse.content
          .filter((b): b is Anthropic.TextBlock => b.type === "text")
          .map(b => b.text)
          .join("");
        return { content: text, toolCallCount, hitBudget: true };
      }

      // Execute the tools and continue
      const toolResults = await Promise.all(
        toolUseBlocks.map(async (block) => ({
          type: "tool_result" as const,
          tool_use_id: block.id,
          content: await executeToolSafely(block.name, block.input),
        }))
      );

      currentMessages = [
        ...currentMessages,
        { role: "assistant", content: response.content },
        { role: "user", content: toolResults },
      ];
    }
  }
}
```

The `maxToolCalls = 5`

default is conservative. Adjust based on what your agent actually does. For a simple lookup agent, 3 is plenty. For a research agent doing multi-step synthesis, 10-15 might be appropriate. The point is to have a limit at all.

A common agent failure mode: the model calls the same tool with the same arguments multiple times in one turn (or across turns). This is wasteful at best and dangerous at worst — imagine calling `send_email`

twice with the same content.

```
class ToolCallDeduplicator {
  private seen = new Map<string, unknown>();
  private readonly ttlMs: number;

  constructor(ttlMs = 60_000) {
    this.ttlMs = ttlMs;
  }

  private makeKey(toolName: string, input: unknown): string {
    return `${toolName}:${JSON.stringify(input)}`;
  }

  async callOnce<T>(
    toolName: string,
    input: unknown,
    fn: () => Promise<T>
  ): Promise<{ result: T; wasCached: boolean }> {
    const key = this.makeKey(toolName, input);

    if (this.seen.has(key)) {
      return { result: this.seen.get(key) as T, wasCached: true };
    }

    const result = await fn();
    this.seen.set(key, result);

    // Expire cache entries
    setTimeout(() => this.seen.delete(key), this.ttlMs);

    return { result, wasCached: false };
  }
}

// Usage in the tool executor
const deduplicator = new ToolCallDeduplicator();

async function executeToolSafely(toolName: string, input: unknown): Promise<string> {
  const { result, wasCached } = await deduplicator.callOnce(
    toolName,
    input,
    () => dispatchTool(toolName, input)
  );

  if (wasCached) {
    console.log(`[dedup] Tool ${toolName} returned cached result`);
  }

  return typeof result === "string" ? result : JSON.stringify(result);
}
```

For idempotent read operations (search, lookup), caching the result is safe and saves money. For write operations (send email, create record, call webhook), you may want to reject duplicates with an error instead of silently returning the cached result — make that distinction explicit in your tool definitions.

When a tool fails, the worst thing you can do is hide the error from the model. Here's a common anti-pattern:

```
// Bad: swallowing errors
async function executeToolBad(name: string, input: unknown): Promise<string> {
  try {
    return await dispatchTool(name, input);
  } catch {
    return ""; // model gets an empty result and often makes something up
  }
}
```

The model receives an empty string and has no idea the tool failed. It often confabulates a plausible-sounding response based on what it expected the tool to return. This is the source of hallucinated data in agents — not the model's training, but the agent framework hiding failures.

```
// Good: structured error propagation
async function executeToolGood(name: string, input: unknown): Promise<string> {
  try {
    const result = await dispatchTool(name, input);
    return typeof result === "string" ? result : JSON.stringify(result);
  } catch (err) {
    const message = err instanceof Error ? err.message : "Unknown error";

    // Return a structured error string that the model can reason about
    return JSON.stringify({
      error: true,
      tool: name,
      message,
      suggestion: getErrorSuggestion(name, err),
    });
  }
}

function getErrorSuggestion(toolName: string, err: unknown): string {
  const msg = err instanceof Error ? err.message : "";
  if (msg.includes("timeout")) return "The service is slow. Consider asking the user to try again.";
  if (msg.includes("not found")) return "The requested resource doesn't exist. Confirm the identifier is correct.";
  if (msg.includes("rate limit")) return "Rate limited. Wait a moment and retry.";
  return "An unexpected error occurred. Inform the user and offer alternatives.";
}
```

With structured error responses, the model can reason about what went wrong and suggest a recovery path to the user, rather than making up a false answer.

Agents that have both read tools (search, lookup, read file) and write tools (send email, create record, delete, call API) need different safety profiles for each category. The model should be able to call read tools freely but should be more cautious — and optionally ask for confirmation — before calling write tools.

``` js
const READ_TOOLS = new Set(["search", "lookup_user", "get_document", "read_calendar"]);
const WRITE_TOOLS = new Set(["send_email", "create_record", "delete_file", "call_webhook"]);
const DESTRUCTIVE_TOOLS = new Set(["delete_file", "cancel_subscription"]);

interface ToolCallDecision {
  allowed: boolean;
  requiresConfirmation: boolean;
  reason?: string;
}

function classifyToolCall(
  toolName: string,
  context: { userConfirmedWrite: boolean; sessionTrusted: boolean }
): ToolCallDecision {
  if (READ_TOOLS.has(toolName)) {
    return { allowed: true, requiresConfirmation: false };
  }

  if (DESTRUCTIVE_TOOLS.has(toolName)) {
    if (!context.userConfirmedWrite) {
      return {
        allowed: false,
        requiresConfirmation: true,
        reason: `${toolName} is irreversible. Explicit user confirmation required.`,
      };
    }
    return { allowed: true, requiresConfirmation: false };
  }

  if (WRITE_TOOLS.has(toolName)) {
    if (context.sessionTrusted && context.userConfirmedWrite) {
      return { allowed: true, requiresConfirmation: false };
    }
    return {
      allowed: false,
      requiresConfirmation: true,
      reason: `${toolName} will make changes. Confirm with user first.`,
    };
  }

  // Unknown tool — default deny
  return {
    allowed: false,
    requiresConfirmation: false,
    reason: `Unknown tool: ${toolName}. Not in allow-list.`,
  };
}
```

The key decision point: when the classification returns `requiresConfirmation: true`

, instead of calling the tool, you return the model's proposed action to the user interface and ask for explicit approval before continuing. The agent pauses at write boundaries.

Tool schemas define what you expect. The model doesn't always deliver exactly that. Even with strict JSON schemas, you'll see: strings where you specified enums, numbers as strings, arrays with a single element instead of an element directly, missing optional fields, extra fields the model invented.

A coercion layer at the tool boundary handles these predictable mismatches without failing:

``` js
import { z } from "zod";

const SearchInputSchema = z.object({
  query: z.string().min(1),
  max_results: z.coerce.number().int().min(1).max(50).default(10),
  // Model sometimes sends "true"/"false" strings for booleans
  include_archived: z.preprocess(
    val => val === "true" ? true : val === "false" ? false : val,
    z.boolean().default(false)
  ),
  // Model sometimes sends a single string instead of array
  filters: z.preprocess(
    val => typeof val === "string" ? [val] : val,
    z.array(z.string()).default([])
  ),
});

async function handleSearchTool(rawInput: unknown): Promise<string> {
  const parseResult = SearchInputSchema.safeParse(rawInput);

  if (!parseResult.success) {
    const errors = parseResult.error.errors.map(e => 
      `${e.path.join(".")}: ${e.message}`
    ).join(", ");

    return JSON.stringify({
      error: true,
      message: `Invalid search parameters: ${errors}`,
      suggestion: "Correct the parameters and try again.",
    });
  }

  const { query, max_results, include_archived, filters } = parseResult.data;
  return await performSearch(query, { max_results, include_archived, filters });
}
```

`z.coerce`

and `z.preprocess`

do the work of handling the common mismatches (string-to-number, string-to-boolean, string-to-array). The schema defines the contract; the coercion layer handles realistic model output.

These five patterns aren't independent — they compose:

Together they form a tool executor that is predictable, cost-controlled, and safe to run unsupervised. Without them, you have a demo. With them, you have an agent you can actually deploy.

The production version of this in Python or TypeScript is about 200 lines. The demo version is 30 lines. That gap is where most AI agent projects live.

The free **Reliable Agent Field Guide** has full implementations of these patterns plus testing strategies: [penloomstudio.com/field-guide.html](https://penloomstudio.com/field-guide.html)
