{"slug": "your-ai-agent-calls-the-wrong-tool-and-your-json-schema-is-usually-why", "title": "Your AI agent calls the wrong tool — and your JSON schema is usually why", "summary": "An engineer warns that AI agents calling the right tool 95% of the time still fail on eight-step tasks about 34% of the time due to compounding errors. The root cause is often poorly written JSON schemas, with four common issues: vague descriptions, untyped parameters, mismatched required fields, and free-text fields that should be enums. The fix is to treat schema descriptions as the model's only instructions and to encode constraints explicitly.", "body_md": "Here's the number that should worry you more than it does: an agent that calls the right tool with the right arguments **95% of the time** completes an eight-step task correctly only about **66%** of the time. Reliability doesn't fail in one dramatic crash. It leaks. Every step is a coin that lands heads 19 times out of 20, and you're flipping it eight times in a row.\n\nThe good news is that most of that leak isn't the model being dumb. It traces to two things you control completely: the **JSON schema** you hand the model, and whether you let it **guess** when it shouldn't. Fix those two and the per-call rate climbs — and because it compounds, small gains pay off hugely.\n\nThis is the reframe that fixes everything downstream. When you define a tool, the `description`\n\nfields aren't docs for your teammates. They are the *only* instructions the model gets about when and how to use that tool. The model never sees your implementation. It sees the schema. That's it.\n\nSo a schema like this is not \"good enough\":\n\n```\n{\n  \"name\": \"send_email\",\n  \"description\": \"Sends an email\",\n  \"parameters\": {\n    \"type\": \"object\",\n    \"properties\": {\n      \"to\": { \"type\": \"string\" },\n      \"body\": { \"type\": \"string\" }\n    }\n  }\n}\n```\n\nRead it the way the model does. *When* should it send an email versus draft one? Is `to`\n\nan address or a contact name? Can `body`\n\nbe HTML? Is anything required? You know the answers. The model is guessing — and guessing is exactly where the 5% comes from.\n\nAfter staring at a lot of broken tool definitions, the same four keep showing up:\n\n**1. Vague or missing descriptions.** \"Sends an email,\" \"Gets data,\" \"Handles the request.\" When two tools have thin descriptions, the model can't tell them apart, so it picks the wrong one. The fix is to write the description like you're explaining the tool to a new hire who will be fired for using it at the wrong time: *when* to call it, *when not to*, and what each argument means.\n\n**2. Untyped or loosely typed params.** A `string`\n\nwhere you meant an ISO date. A `string`\n\nwhere you meant one of four statuses. If the type doesn't constrain the value, the model invents a plausible-looking one — `\"next Tuesday\"`\n\n, `\"done-ish\"`\n\n— and your executor chokes. Use `enum`\n\nfor fixed sets. Use `format`\n\nand explicit types. Every constraint you encode is one the model can't violate.\n\n**3. The silent killer: required naming a property that doesn't exist.** This one is brutal because nothing yells at you. Your\n\n`required`\n\narray lists `\"recipient\"`\n\n, but the property in `properties`\n\nis called `to`\n\n. The schema is still valid JSON. The model now thinks a field is mandatory that it has no slot to fill — so `required`\n\nactually exists in `properties`\n\n.**4. Free-text where you meant a choice.** `\"priority\": { \"type\": \"string\" }`\n\ninvites `\"high\"`\n\n, `\"High\"`\n\n, `\"urgent\"`\n\n, `\"P0\"`\n\n, and `\"pretty important tbh\"`\n\n. Make it `\"enum\": [\"low\", \"medium\", \"high\"]`\n\nand the ambiguity is gone before the model can create it.\n\nThe single most common *production* failure isn't a malformed call — it's the model confidently filling in a blank it should have asked about. User says \"schedule a meeting with Sarah next week.\" Which Sarah? Which timezone? Which 30-minute slot on which day? A model optimizing to be helpful will pick one. Sometimes it's right. Sometimes it books a 7 a.m. call with the wrong Sarah.\n\nThe rule I'd tattoo on a junior agent: **if a missing field affects money, publishing, deletion, or customer communication, ask — don't guess.** A clarifying question costs one turn. A wrong write operation costs a refund, a deleted record, or an apology email. Don't optimize for fewer turns at the price of wrong actions.\n\nYou can encode a lot of this in the schema itself: don't mark fields `required`\n\nthat the model can't reasonably infer, and say so in the description — *\"If the user has not specified a timezone, ask; do not assume.\"* The schema is where you set the defaults for the model's judgment.\n\nEven when the provider guarantees well-formed JSON, well-formed is not the same as *correct*. Structured-output modes stop the model from emitting broken JSON; they do nothing to stop it from passing a valid-looking but wrong argument. So validate on your side, every time, *before* you execute: check the values against your real constraints (does this user ID exist? is this amount within range?), and on failure, return a clear error the model can read and recover from rather than crashing the run. Model output is input. You wouldn't trust raw input from a form field. Don't trust this one either.\n\nReading your own schemas for these bugs is hard — the `required`\n\n-references-a-missing-property one in particular is invisible until it's breaking every call in prod. So I wrote a tiny zero-dependency linter for exactly this: [ tool-schema-lint](https://github.com/Penloom-Studio/tool-schema-lint) (\n\n`npx tool-schema-lint your-tools.json`\n\n). It flags vague descriptions, untyped params, free-text-where-you-meant-enum, and the silent `required`\n\n/`properties`\n\nmismatch — for both Anthropic and OpenAI tool formats. It's free and MIT-licensed; point it at your tool definitions and see what falls out.If you want the bigger picture — the tool-patterns that keep multi-step agents on the rails, plus a runnable eval rubric for scoring \"did it call the right tool with the right args in the right number of steps\" — that's the [ Agent Builder's Toolkit](https://penloomstudio.com/index.html). And if you're earlier on the curve, the\n\nTool-calling reliability compounds: 95% per call is ~66% over eight steps, so small per-call gains matter enormously. Most misses come from two controllable things. First, the schema — it's the only instruction the model gets, so write real descriptions, type and `enum`\n\nyour params, and make sure every name in `required`\n\nactually exists in `properties`\n\n(that last bug silently breaks every call). Second, guessing — if a missing field touches money, publishing, deletion, or customer communication, make the agent ask instead of inventing a value. Then validate the model's output as untrusted input before you execute. Schema plus judgment, not a smarter model, is where the reliability lives.\n\n*What's the worst wrong-tool call you've shipped? Reply and tell me — I collect these.*", "url": "https://wpnews.pro/news/your-ai-agent-calls-the-wrong-tool-and-your-json-schema-is-usually-why", "canonical_source": "https://dev.to/penloom_studio_829b7817d3/your-ai-agent-calls-the-wrong-tool-and-your-json-schema-is-usually-why-2886", "published_at": "2026-06-28 05:14:25+00:00", "updated_at": "2026-06-28 06:03:24.612991+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "developer-tools"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/your-ai-agent-calls-the-wrong-tool-and-your-json-schema-is-usually-why", "markdown": "https://wpnews.pro/news/your-ai-agent-calls-the-wrong-tool-and-your-json-schema-is-usually-why.md", "text": "https://wpnews.pro/news/your-ai-agent-calls-the-wrong-tool-and-your-json-schema-is-usually-why.txt", "jsonld": "https://wpnews.pro/news/your-ai-agent-calls-the-wrong-tool-and-your-json-schema-is-usually-why.jsonld"}}