How Tool Search Works and How It Saves Tokens

wpnews.pro

Let’s look into how tool search works, and why it saves tokens.

The Problem: Schemas Are Expensive #

Every tool you give an agent is not just a name. It is a full contract: name, description, and a JSON Schema for parameters with types, enums, required fields, and usually a paragraph of docs so the model knows when to use it.

One decent tool definition is easily 500 to 1500 tokens. Multiply by 200 and you are burning tens of thousands of tokens on a menu the agent mostly will not order from.

And it is not only tokens. There is second, sneakier cost. When the model has 200 tools in front of it, picking the right one gets harder. Selection accuracy drops. More tools means more chances to grab the wrong one, or to hallucinate parameters from a tool that looks similar.

So you pay twice: once in money and latency for the tokens, once in quality for the confusion.

The Idea: A Tool That Finds Tools #

Tool search flips the default. Instead of every schema upfront, the harness loads a small index — just the names, and maybe one line of metadata each — plus one extra tool whose only job is to find tools.

The agent does not see 200 schemas. It sees 200 names and a tool_search

tool. When it actually needs to start a resource or open a pull request, it searches, gets back the full schema for just that tool, and only then calls it.

You can think of it as a tool whose job is the toolbox itself. A tool inside the toolbox that hands you other tools on request. Brand inside a brand, if you like.

In this very environment it looks like this. Deferred tools show up by name only:

The following deferred tools are now available via ToolSearch.
Their schemas are NOT loaded — calling them directly will fail.
  WebFetch, WebSearch, NotebookEdit, Monitor, ...

The names are there. The schemas are not. If I call WebFetch

right now, I get an InputValidationError

, because there is no parameter schema yet to validate against. I have to fetch it first.

How It Saves Tokens #

The trick is that names are cheap and schemas are not.

A name plus one line of metadata is maybe 10-15 tokens. A full schema is 500-1500. So the index of 200 tools costs about the same as two or three full schemas. You keep the whole menu in context for almost nothing, and you only pay the real price for the handful of tools you actually load.

The math is boring but it works. 200 schemas upfront might be 60k tokens. The same catalog as names plus a search tool is maybe 2-3k. You pull two schemas during the task and add another 2k. You spent 5k instead of 60k, and the agent saw a shorter, cleaner menu the whole time.

That is the 80% of the value. Everything else is tuning.

Two Kinds of Tool Search #

Here is where it gets interesting. There are really two layers, and they do not compete — they stack.

Generic tool search

This is the one built into the harness. It indexes everything — built-in tools, every connected MCP server, all of it — into one searchable catalog. You do not configure it per provider. New MCP server connects, its tools just show up in the same index next to everything else.

Generic search usually gives you two query styles:

select:Read,Edit,Grep      # I know the exact names, just load them
notebook jupyter            # keyword search, give me best matches

select:

is direct by name. Use it when you already know what you want — the agent saw the name in the index, no need to “search,” just fetch the schema. Keyword search is for when you know the capability but not the exact tool: “something that sends slack messages.”

Provider-specific tool search

This is the second layer, and it is easy to miss. A provider can ship its own search tool, scoped to its own domain. GitHub MCP has search_code

and search_issues

. A docs MCP server might expose a grep over its files. Figma has search_design_system

.

These are not searching for tools. They are searching inside the provider’s data. But — and this is the neat part — from the agent’s point of view they are just more tools in the generic index. So the agent first uses generic tool search to find search_code

, loads it, and then uses search_code

to find actual code.

Search to find the search. It nests, and that is fine. Each layer does one job.

You Do Not Have to Build This #

Good news. The model providers are shipping tool search as a native API feature, so in many cases you do not hand-roll the loop at all.

Anthropic has a Tool Search Tool. You mark tools with

defer_: true

. Those definitions get sent to the API but do not land in context — they are stripped from the rendered tools before the prompt cache key is even computed. Claude gets a small search tool, finds what it needs, and only then are the matched schemas appended. There are two flavors: regex search (tool_search_tool_regex_20251119

) and BM25 search (tool_search_tool_bm25_20251119

) for natural-language queries.**OpenAI has ** Same idea — load deferred tool definitions at runtime instead of all upfront. On their side it is gated to newer models (gpt-5.4 and later at the time of writing).

tool_search

too.So the thing I described above is not some clever harness trick you must build yourself. It is becoming a first-class API primitive. You declare a pile of tools, flag most of them as deferred, add the search tool, and the provider handles discovery. The harness I am running in does exactly this — that is why I see tools “by name only” until I fetch them.

And here is the part that matters more than it looks: prompt caching. Deferred schemas are appended on demand and never sit in the cached prefix. So a new tool mid-session does not blow your cache. Do it the naive way — edit the tools array when you need a new tool — and you invalidate the whole cache every time the tool set changes, because tools render at the very front of the prompt. Tool search appends instead of rewriting, so the cached prefix survives. For long agent runs that is a real cost difference, not a rounding error.

Tricks To Make It Good #

Tool search is only as good as the metadata you feed it. Here is what I see working.

1. Names and one-liners are your real API now

The agent picks what to load based on the index. If your index entry says tool_042

with no description, search is blind. Spend the effort on a clear name and a tight one-line summary. The full schema can be verbose — that is fine, it loads on demand. But the index line is what gets the tool found in the first place.

Boring names win. create_pull_request

beats cpr

. search_code

beats finder

.

2. Select by name when you already know it

If the name is already sitting in the index and the agent can see it, do not make it run a fuzzy keyword search. Load it directly:

select:create_pull_request,get_file_contents

Fewer round trips. No guessing. Search is for discovery, select:

is for “I saw it, give it to me.”

3. Batch the fetch

Most tool search lets you pass a list of tools in one go, not one at a time. If the task clearly needs three tools, load three in a single call instead of three separate searches. One round trip, three schemas, back to work.

This matters more than it sounds. Each separate search is a model turn. Batching collapses three turns into one.

4. Keep names in context even when schemas are gone

This is the underrated part. After a schema is loaded and used, the name stays in the index. The agent does not forget the tool exists. It forgets the details, which it can re-fetch, but it always knows the capability is there.

So design the index to be a good table of contents. Group related tools so their names sit together. github_*

, figma_*

, linear_*

. When the agent scans names, the grouping does half the search for it.

5. Namespace by provider

Following from that — prefix tools with their source. mcp__github__search_code

tells the agent both what it does and where it comes from. When two providers have a search

tool, namespacing keeps them apart and keeps keyword search from returning a confusing pile.

6. Let providers bring their own search

If you are building an MCP server with a lot of data behind it — docs, designs, a code index — do not dump 200 fine-grained tools into the catalog. Ship one good search tool over your domain and a few operations. The generic tool search finds your search tool, and your search tool handles the rest. One door instead of two hundred.

I wrote about a version of this before, where a docs agent uses shell search instead of vector search. Same instinct. Give the agent one good way to look things up, not a giant flat list.

Interesting Findings #

A few things that surprised me while living with this.

Connecting is async. MCP servers do not all show up at once. Some are still connecting while the agent already started. So the index grows during the session. If you search for a capability and miss, it might just not be connected yet — worth a retry rather than concluding it does not exist.

The index is untrusted-ish. Tool names and descriptions come from whatever providers you connected. A malicious or sloppy MCP server can put misleading metadata in the index. Generic search does not vet that for you. Same caution you would apply to any third-party tool applies to its index entry too.

Calling before fetching is the classic mistake. Deferred tool, no schema, direct call, InputValidationError

. The fix is always the same: search or select:

first, then call. Once you internalize that the name is not the tool, it stops happening.

The Point #

Tool search is not a fancy retrieval system. It is lazy for tool schemas, with a good table of contents on top.

Load names, not schemas. Search to load the few you need. Let providers bring their own search instead of flooding the catalog. Keep the index well-named so the agent can actually find things.

Do that and an agent with 200 tools feels like an agent with five. Which is the whole point — the agent should think about the task, not about the menu.

For comments or feedback, write at x.com/chaliy.

source & further reading

chaliy.name — original article You Do Not Need a Server for Evals My Three Phases of Coding with Agents Build a Docs Agent Without Vector Search