# Designing tools so an LLM actually calls them correctly: 5 patterns from the CCA-F blueprint

> Source: <https://dev.to/kiran_manne/designing-tools-so-an-llm-actually-calls-them-correctly-5-patterns-from-the-cca-f-blueprint-7li>
> Published: 2026-06-28 23:00:00+00:00

The first time I shipped a tool-using agent, it kept calling the wrong tool. Not occasionally — often enough that I couldn't ship it. The tools worked. The schemas were valid. The descriptions were technically accurate. The model still picked the wrong one, over and over.

What I learned, painfully, is that designing a tool for an LLM caller is closer to writing technical documentation than writing a function signature. The model is reading your tool definition as part of its decision about whether and how to invoke it. If the description is ambiguous, the model is going to guess — and you don't want that.

This post walks through five patterns I now use by default, all of which map directly to material in the Tool Design & MCP domain of the Claude Certified Architect Foundations exam (which is, not coincidentally, the domain where a lot of strong backend engineers lose points).

The most common anti-pattern: writing the tool description the way you'd write a Python docstring.

```
{
  "name": "get_user",
  "description": "Returns the user object for a given user ID.",
  "input_schema": {
    "type": "object",
    "properties": {
      "user_id": { "type": "string" }
    },
    "required": ["user_id"]
  }
}
```

That description tells the model *what the tool does*. It does not tell the model *when to call it*, *when not to call it*, or *what "user object" actually contains*. So the model has to guess.

The rewrite:

```
{
  "name": "get_user",
  "description": "Fetch a user's profile (name, email, plan tier, signup date) by their internal user ID. Use this when the user is asking about a specific account and you have a user ID. Do NOT use this for email-based lookups — use lookup_user_by_email instead. Returns null if the user does not exist; does not raise.",
  "input_schema": { ... }
}
```

The rewrite answers four questions the model would otherwise guess at: what's in the response, when to use it, when not to use it, and what happens on the unhappy path. That's the bar.

Heuristic: if your tool description doesn't contain the word "when", it's probably not finished.

LLMs are excellent at producing plausible-looking arguments. They are less excellent at producing *correct* arguments when the schema admits ambiguity.

A loose schema:

```
{
  "properties": {
    "status": { "type": "string" },
    "priority": { "type": "string" },
    "date": { "type": "string" }
  }
}
```

The tightened version:

```
{
  "properties": {
    "status": {
      "type": "string",
      "enum": ["open", "in_progress", "closed"],
      "description": "Current status of the ticket."
    },
    "priority": {
      "type": "integer",
      "minimum": 1,
      "maximum": 4,
      "description": "1 = urgent, 4 = low. Default to 3 if unspecified."
    },
    "date": {
      "type": "string",
      "format": "date",
      "description": "ISO 8601 date (YYYY-MM-DD), in UTC."
    }
  }
}
```

Three changes, each of which removes a class of failure:

`status`

is now an enum. The model can't invent `"pending"`

or `"new"`

.`priority`

has a documented scale and a documented default. The model isn't guessing whether 1 or 4 is higher priority.`date`

has a format and a timezone convention. The model isn't deciding whether to write `"tomorrow"`

or `"2026-05-30"`

.Every time you let the model improvise on a string field, you're paying for that improvisation in tail latency and bug reports.

This is the single pattern that made the biggest difference to my own agent's reliability, and I think it's badly underrated.

When a tool fails, the model will see the result as part of the next turn's context. If your error is uninformative, the model has nothing to act on. If your error is structured and instructive, the model can recover on its own — no retry logic, no human in the loop.

Bad:

```
{ "error": "bad input" }
```

Better:

```
{
  "error": {
    "code": "USER_NOT_FOUND",
    "message": "No user exists with id 'usr_9k2x'.",
    "hint": "If you only have an email address, call lookup_user_by_email instead."
  }
}
```

The `hint`

field is the one that earns its keep. It turns the tool's error path into a redirect. The model reads the hint, calls the right tool on the next turn, and the agent loop closes successfully without the user ever seeing the failure.

A good rule: every error your tool returns should answer the question "what should the model do differently next time?"

There is a strong temptation, especially if you've written REST APIs, to design a single tool with a `mode`

or `action`

parameter that switches behavior:

```
{
  "name": "manage_ticket",
  "description": "Create, update, close, or reopen a ticket.",
  "input_schema": {
    "properties": {
      "action": { "enum": ["create", "update", "close", "reopen"] },
      "ticket_id": { "type": "string" },
      "title": { "type": "string" },
      "body": { "type": "string" }
    }
  }
}
```

This looks elegant. It is, in practice, worse than four separate tools.

Reasons:

`ticket_id`

is required for update/close/reopen but not create. The schema can't enforce that without `oneOf`

, and even with `oneOf`

, the model's adherence rate drops.`"missing required field"`

is less useful than `"close_ticket requires ticket_id"`

.The rewrite is four tools: `create_ticket`

, `update_ticket`

, `close_ticket`

, `reopen_ticket`

. Each has a focused description, a tight schema, and instructive errors. The model picks correctly more often, and when it doesn't, the failure is local to one tool, not the whole multiplexer.

The exception: if you have genuinely dozens of near-identical operations (CRUD over hundreds of resource types), at some point a parameterized tool wins on context window cost. That tradeoff is real, but it shows up later than people think.

When the model emits multiple `tool_use`

blocks in a single turn, the runtime executes them in parallel and returns all results before the next model call. This is the agent loop's most underrated feature, and it's the one most tool designs accidentally break.

A tool is parallel-safe when:

A tool is *not* parallel-safe when:

If you have tools that aren't parallel-safe, document that explicitly in the description: `"This tool must be called after fetch_session and not in parallel with other write operations."`

The model will respect that.

The deeper point: when you design a new tool, ask yourself "what happens if the model calls this twice in the same turn?" If the answer is "undefined behavior", you have more work to do.

If you're exposing tools through the Model Context Protocol (MCP), all five patterns still apply — the protocol just standardizes how the host discovers and invokes them. MCP adds two adjacent primitives worth knowing:

The practical implication: if your tool is fundamentally "give me the content of X", it might want to be a resource, not a tool. Resources are cheaper to expose, easier to cache, and don't burn an agent-loop turn to read.

Most "the model called the wrong tool" bugs are not model bugs. They're tool design bugs surfaced by the model's willingness to guess when the design is ambiguous. The five patterns above — descriptions that say *when*, tight schemas, instructive errors, narrow tools, and parallel-safety — close the ambiguity gaps that LLMs are most likely to fall into.

If you're studying for the Claude Certified Architect Foundations exam, the Tool Design & MCP domain (18% of the blueprint) is heavy on scenario questions that test exactly these tradeoffs. The independent practice platform I run at [claudecertifiedarchitect.dev](https://claudecertifiedarchitect.dev) has a free 15-question set — a few of them in this domain — if you want to calibrate where you currently stand. It emails you a diagnostic at the end; no payment to try it.

Not affiliated with Anthropic; just a thing I built because I wanted blueprint-weighted practice that didn't exist yet.