{"slug": "hermes-agent-ships-tool-search-for-mcp-anthropic-evals-show-49-to-74-accuracy-on", "title": "Hermes Agent Ships Tool Search for MCP: Anthropic Evals Show 49% to 74% Accuracy Gain on Opus 4", "summary": "Nous Research's open-source Hermes Agent now ships a Tool Search feature that addresses context window overload from MCP tools. Anthropic's internal evaluations show the feature improved Claude Opus 4's accuracy from 49% to 74% and Opus 4.5's from 79.5% to 88.1% on MCP benchmarks, while reducing tool-definition token usage by 85%.", "body_md": "**Nous Research’s open-source Hermes Agent now ships a Tool Search feature.** It directly addresses a growing bottleneck in AI agent systems: too many MCP tools filling up the context window. In this explainer article, we will breaks down what Tool Search does, how it works, and when to use it.\n\n**The Problem: MCP Tools Are Eating Your Context Window**\n\nWhen you connect multiple MCP (Model Context Protocol) servers to an AI agent, every tool’s JSON schema gets sent to the model on every turn. This happens even if the model only needs one or two tools for a given task.\n\nReal-world deployments feel this immediately. A Hermes deployment with five MCP servers and 34 tools shows average prompt sizes of 45,000 tokens per turn. Roughly 22,000 of those tokens — around 50% — are tool schema overhead alone.\n\n[Anthropic’s own engineering data](https://www.anthropic.com/engineering/advanced-tool-use) shows tool definitions can consume 134,000 tokens before optimization. [Tool Attention](https://arxiv.org/abs/2604.21816) measures the “MCP Tools Tax” at **15,000–60,000 tokens per turn** for typical multi-server deployments.\n\nThis creates two distinct problems:\n\n**Cost**: Cache-miss generations at session start can cost $0.07–$0.10 per turn.** Accuracy loss**: Decision paralysis sets in when the model sees hundreds of irrelevant tool options simultaneously.\n\n**What is Tool Search?**\n\nTool Search is Hermes Agent’s opt-in progressive-disclosure layer for MCP and non-core plugin tools. Instead of loading every tool schema upfront, the model loads only what it needs — on demand, per turn.\n\n**When Tool Search activates, MCP and plugin tools are replaced in the model-visible tools array by three bridge tools:**\n\n```\ntool_search(query, limit?)   — search the deferred-tool catalog\ntool_describe(name)          — load the full schema for one tool\ntool_call(name, arguments)   — invoke a deferred tool\n```\n\n**A typical interaction looks like this:**\n\n```\nModel: tool_search(\"create a github issue\")\n  → { matches: [{ name: \"mcp_github_create_issue\", ... }] }\nModel: tool_describe(\"mcp_github_create_issue\")\n  → { parameters: { type: \"object\", properties: { ... } } }\nModel: tool_call(\"mcp_github_create_issue\", { title: \"...\", body: \"...\" })\n  → { ok: true, issue_number: 42 }\n```\n\nThe model searches for what it needs, loads the schema, then calls the tool. All hooks, guardrails, and approval prompts run against the real underlying tool name — not against the bridge.\n\n**The Accuracy Numbers**\n\nThis is not just a token-saving feature. Tool Search also **improves model accuracy** on MCP evaluations.\n\n**According to Anthropic’s internal MCP evals:**\n\n**Claude Opus 4**: accuracy improved from** 49% → 74%**with Tool Search enabled** Claude Opus 4.5**: accuracy improved from** 79.5% → 88.1%**with Tool Search enabled\n\nLarge tool catalogs create “decision paralysis” — the model gets confused choosing among many irrelevant options. Removing those options from the context window reduces false positives. [Anthropic’s data](https://www.anthropic.com/engineering/advanced-tool-use) also shows an **85% reduction in tool-definition token usage** while maintaining access to the full tool library.\n\n**How the Retrieval Works: BM25 + Fallback**\n\nUnder the hood, Hermes uses **BM25** — a classic information retrieval algorithm — to match the model’s query against a catalog of tool names, descriptions, and parameter names.\n\nIf BM25 returns no positive-score hits, the system falls back to a literal substring match on the tool name. This protects against zero-IDF degenerate cases, such as searching for `\"github\"`\n\nin a catalog where every tool name contains “github.”\n\nThe catalog is **stateless across turns**. It rebuilds from the current tool-defs list on every assembly. This prevents drift bugs where a stored catalog goes out of sync with the live tool registry.\n\n**When Does Tool Search Activate?**\n\nBy default, Tool Search runs in `auto`\n\nmode. It activates only when the deferrable tool schemas would consume **at least 10% of the active model’s context window**.\n\nBelow that threshold, the tools-array assembly is a pure pass-through. You pay no overhead.\n\n**This decision is re-evaluated on every turn:**\n\n- A session with just a few MCP tools and a long-context model may never activate Tool Search.\n- A session with many MCP servers attached (15+ tools typically) starts activating it.\n- Removing servers mid-session correctly returns to direct tool exposure on the next assembly.\n\n**Configuration Reference**\n\nAdd this to your `hermes.yaml`\n\nto control the behavior:\n\n```\ntools:\n  tool_search:\n    enabled: auto        # auto (default), on, or off\n    threshold_pct: 10    # % of context at which auto mode kicks in\n    search_default_limit: 5\n    max_search_limit: 20\n```\n\n| Key | Default | Meaning |\n|---|---|---|\n`enabled` | `auto` | `auto` activates above threshold; `on` always activates if there’s at least one deferrable tool; `off` disables entirely |\n`threshold_pct` | `10` | Percentage of context length at which `auto` kicks in. Range: 0–100 |\n`search_default_limit` | `5` | Hits returned when the model calls `tool_search` without a `limit` |\n`max_search_limit` | `20` | Hard upper bound the model can request via `limit` . Range: 1–50 |\n\nYou can also use a simple boolean shorthand:\n\n```\ntools:\n  tool_search: true   # equivalent to {enabled: auto}\n```\n\n**Marktechpost’s Visual Explainer**\n\n**Key Takeaways**\n\n- Tool Search defers MCP tool schemas until the model actually needs them — using a\n`tool_search`\n\n/`tool_describe`\n\n/`tool_call`\n\nbridge. [Anthropic](https://www.anthropic.com/engineering/advanced-tool-use)'s evals show accuracy gains from 49% → 74% on Claude Opus 4 with large tool catalogs.- BM25 retrieval over tool name + description + parameter names powers the search, with substring fallback for zero-IDF edge cases.\n`auto`\n\nmode (default) is self-tuning — activates only when tool schemas exceed 10% of the context window.- Core Hermes tools are never deferred; only MCP and non-core plugin tools are eligible.\n\nCheck out the ** Hermes Agent Tool Search Documentation** and\n\n**Also, feel free to follow us on**\n\n[Anthropic Advanced Tool Use](https://www.anthropic.com/engineering/advanced-tool-use).**and don’t forget to join our**[Twitter](https://x.com/intent/follow?screen_name=marktechpost)\n\n**and Subscribe to**\n\n[150k+ ML SubReddit](https://www.reddit.com/r/machinelearningnews/)**. Wait! are you on telegram?**\n\n[our Newsletter](https://www.aidevsignals.com/)\n\n[now you can join us on telegram as well.](https://t.me/machinelearningresearchnews)Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? [Connect with us](https://forms.gle/wbash1wF6efRj8G58)", "url": "https://wpnews.pro/news/hermes-agent-ships-tool-search-for-mcp-anthropic-evals-show-49-to-74-accuracy-on", "canonical_source": "https://www.marktechpost.com/2026/05/29/hermes-agent-ships-tool-search-for-mcp-anthropic-evals-show-49-to-74-accuracy-gain-on-opus-4/", "published_at": "2026-05-30 03:11:59+00:00", "updated_at": "2026-05-30 03:28:22.091330+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-tools", "ai-research", "ai-infrastructure"], "entities": ["Nous Research", "Hermes Agent", "MCP", "Anthropic", "Opus 4", "Tool Search", "Tool Attention"], "alternates": {"html": "https://wpnews.pro/news/hermes-agent-ships-tool-search-for-mcp-anthropic-evals-show-49-to-74-accuracy-on", "markdown": "https://wpnews.pro/news/hermes-agent-ships-tool-search-for-mcp-anthropic-evals-show-49-to-74-accuracy-on.md", "text": "https://wpnews.pro/news/hermes-agent-ships-tool-search-for-mcp-anthropic-evals-show-49-to-74-accuracy-on.txt", "jsonld": "https://wpnews.pro/news/hermes-agent-ships-tool-search-for-mcp-anthropic-evals-show-49-to-74-accuracy-on.jsonld"}}