The Model Context Protocol is often explained as a standard way for AI applications to connect to tools, data, and external systems.
That part is useful. Before MCP, every AI application had to invent its own way to talk to files, APIs, databases, calendars, issue trackers, logs, and internal services. A shared protocol is a real improvement.
But once you connect more than a few servers, a different problem appears.
The hard part is no longer “can the model call a tool?”
The hard part becomes:
Which tool should the model even see?
That sounds like a small detail. It is not.
The naive version works #
The simple MCP workflow is easy to understand.
A client connects to a server. The client asks for the tools. The server returns a list of tools. Each tool has a name, a description, and an input schema. The model can then decide which tool to call.
For a demo, this works well.
A server with three tools is easy:
get_weather
search_files
create_ticket
The model sees all three. The user asks a question. The right tool is usually obvious.
This is the happy path most examples show.
The problem starts when the system succeeds.
Success means too many tools #
A useful AI assistant does not stay connected to one tiny server.
It gets connected to GitHub. Then Slack. Then Google Drive. Then Jira. Then a database. Then internal logs. Then billing. Then a deployment system. Then a few company-specific tools nobody outside the team understands.
Suddenly the model is not choosing between three tools.
It is choosing between 30, 80, or 200.
Each of those tools needs a name. Each needs a description. Each needs a schema. Many need nested properties, enums, examples, constraints, and warnings.
All of that must be represented somehow for the model.
This creates a context tax.
Before the model answers the user, before it reads the actual task, before it reasons about anything useful, the context window is already filled with tool definitions that may not matter for the current turn.
Most of the time, the user does not need 200 tools.
They need one.
Tool definitions are not free #
Tool definitions look small when viewed one at a time.
A name here. A description there. A JSON schema. A few parameters.
But LLMs do not experience tool definitions one at a time. They experience the whole available tool surface as context.
That matters for two reasons.
First, tool definitions consume tokens. The model has less room for the actual conversation, project files, logs, code, or documents.
Second, too many tools make selection harder. A larger tool list does not only cost more. It also creates more opportunities for the model to choose a plausible but wrong tool.
This is especially painful because many real tools are semantically close.
Is the right tool search_documents
, search_files
, query_knowledge_base
, find_resource
, list_pages
, or get_record
?
A human can ask a clarifying question or inspect the system. A model often picks the tool that sounds closest.
Sometimes that is enough. Sometimes it is not.
MCP standardizes listing, not relevance #
This is the core weakness.
MCP gives clients a standard way to ask a server what tools it has.
It does not give the client a standard way to ask:
- Which tools are relevant to this user request?
- Which tools should be shown for this intent?
- Which tool schemas can be loaded later?
- Which tools are commonly confused with each other?
- Which tools are safe to expose in this context?
- Which tools should be hidden unless specifically requested?
The protocol has tools/list
.
What many LLM applications need is closer to tools/search
.
Not search as a product feature. Search as a context-management primitive.
A model should not need to carry every possible tool definition in its active context just because the user might need one of them later.
Anthropic already worked around this #
A good sign that something is missing at the protocol layer is when people solve it outside the protocol.
Anthropic’s Tool Search/deferred-tools pattern is exactly that kind of signal.
Instead of every tool definition upfront, the model sees a small search tool. When it needs a capability, it searches for relevant tools. Only then are the matching tool definitions loaded into context.
That is a sensible design.
It treats tool discovery as its own step instead of assuming every tool must be present from the beginning.
But this also highlights the gap: if every host or model provider solves tool discovery differently, MCP becomes a standard for exposing tools, while relevance remains vendor-specific.
That is not fatal. It is normal for young protocols.
But it is the next problem to solve.
Tool search should be boring #
A protocol-level tool search primitive does not need to be complicated.
It could start with something simple:
{
"method": "tools/search",
"params": {
"query": "create a GitHub issue for this bug",
"limit": 5
}
}
The server could return a ranked subset of tools:
{
"tools": [
{
"name": "github.create_issue",
"title": "Create GitHub issue",
"description": "Create a new issue in a repository"
},
{
"name": "github.search_issues",
"title": "Search GitHub issues",
"description": "Search existing issues before creating a duplicate"
}
]
}
Then the client could request full schemas only for the tools it wants to expose to the model.
There are many ways to design this. The exact shape matters less than the principle:
Tool discovery should be lazy, relevant, and explicit.
The current default is too eager.
Lazy schemas would help too #
There is another version of the same idea: deferred schema .
The model may need to know that a tool exists before it needs the entire input schema.
For example, the model might only need this at first:
{
"name": "billing.explain_invoice",
"description": "Explain an invoice status for a customer"
}
Only if the model chooses that tool does it need the full schema:
{
"customer_id": "string",
"invoice_id": "string",
"include_line_items": "boolean"
}
That distinction matters.
Humans do this all the time. We scan names first. We inspect details only when something looks relevant.
LLM tools should work the same way.
This is not only about cost #
It is tempting to frame this as a token-cost problem.
That is part of it, but not the most interesting part.
The bigger issue is reliability.
A model with fewer irrelevant tools has fewer chances to make the wrong call. A model that sees only the relevant schema has less noise to interpret. A model that discovers tools intentionally can explain why a tool was selected.
This also helps debugging.
If an agent fails today, you may not know whether the tool was bad, the description was bad, the schema was confusing, or the model simply got distracted by a different tool that looked similar.
A relevance step makes the process more inspectable.
The system can log:
- the user request
- the tool search query
- the candidate tools returned
- the tool definitions loaded
- the final tool call
That is much easier to reason about than one giant prompt containing every tool the user might ever need.
The problem is worse for voice agents #
Tool selection latency is annoying in chat.
In voice, it is brutal.
A chat assistant can for a few seconds and still feel usable. A voice agent that waits silently feels broken.
If the model has to process a large tool surface before deciding what to do, every turn becomes heavier. If the selected tool then returns a large result inline, the assistant may wait even longer before it can speak.
Realtime systems need partial progress.
They need the assistant to say something useful while work continues.
They need tool discovery and tool results to avoid dumping unnecessary context into the model.
MCP is moving in that direction with discussions around better result types, streaming, and reference-based results. But tool relevance is the earlier problem. Before a tool returns too much data, the model first has to select the right tool.
A protocol is allowed to be incomplete #
None of this means MCP is bad.
MCP solved a real integration problem. It gave the ecosystem a common shape for exposing tools, resources, and prompts to AI applications.
That is valuable.
But protocols often expose the next layer of problems after they solve the first one.
HTTP made web integration easier, then caching, authentication, compression, security headers, and content negotiation became important. SQL standardized querying, then indexing, planning, permissions, migrations, and replication became important.
MCP standardized the tool boundary.
Now tool discovery needs to grow up.
What I would like to see #
The smallest useful addition would be a relevance-aware tool discovery primitive.
Something like:
tools/search
-
intent-filtered
tools/list -
deferred schema
-
ranked tool subsets
-
tool groups or namespaces
-
a standard way to ask for “tools matching this task”
The exact method name is not important.
The important part is that clients should not have to choose between two bad defaults:
- Load every tool definition into the model context.
- Invent a vendor-specific relevance layer outside MCP.
A protocol for model-context should care deeply about what enters the model context.
Tool definitions are context.
That means tool discovery is not a secondary feature.
It is part of the core design problem.
The future MCP assistant should not see everything #
The best AI assistants will not be the ones connected to the most tools.
They will be the ones that can find the right tool at the right moment, load only the context they need, call it safely, and explain what happened.
MCP already gives us a standard way to expose tools.
The next step is giving models a standard way to not see all of them at once.
Comments #
No comments yet. Be the first to share your thoughts.