When MCP tools break, isolate the server before blaming the agent

wpnews.pro

You add a new MCP server, or a new tool to an existing one. You ask the agent to use it and it says the tool isn't available — or worse, it confidently reports that "only the HTTP server is connected" and the new stdio servers aren't live. The instinct is to blame the MCP server or bounce the runtime. Resist it. The agent's narration about its own toolset is not ground truth, and a malformed test invocation will happily manufacture a fake "MCP unreachable" symptom. Here's the broke → tried → fixed of a debugging session that burned time on exactly this.

We'd just added a second stdio MCP server and a new tool. To smoke-test it, we hand-typed a quick probe at the agent runtime and asked it to use the new capability. The agent came back with this:

Those tools aren't in my live toolset right now — only the
Home Assistant (HTTP/SSE) server is connected. I'll read the
files directly instead.

It then shelled out to read files off disk to answer the question — being resourceful, which made it look like the tools genuinely weren't wired up. Two new stdio servers, both apparently invisible to the session, while the one HTTP/SSE server was fine. That sure reads like the stdio MCP servers aren't connecting.

That self-report is the trap. Treating "the agent says the tools aren't available" as a diagnosis — instead of a symptom — is what sent us down the wrong path.

There are two failure modes here and they look identical from the agent's chair:

The agent can't tell these apart for you, and it will narrate either one as "tools not available." Worse, an LLM describing its own capabilities is itself a confabulation risk — it pattern-completes a plausible story ("only the HTTP server is live") around the fact that some tools didn't show up. That story is not evidence about server health.

So the discipline is: don't ask the agent whether the server is up. Ask the server. Isolate the layer before you touch anything.

We took the self-report at face value. The agent said the stdio servers weren't connected, so we assumed they weren't connected. This is the root mistake every later dead-end inherits from — we let the agent's narration stand in for a connectivity check. It is not one.

"Tools didn't load? Refresh them — restart the runtime." So we bounced the long-running agent gateway daemon, the shape of which is:

<runtime> gateway restart

Then re-ran the same probe. No change:

Still only the Home Assistant server is connected — the other
tools aren't in my toolset.

Restarting a daemon to "refresh tools" is a guess, not a diagnosis. If the server were genuinely stale you'd want evidence of that first. We had none — we were rebooting on a hunch, and the hunch was wrong. (It also cost real time: a gateway restart drops every in-flight session.)

We re-ran the hand-typed probe a few more times, tweaking the wording, still getting "tools not available." At this point it looks airtight: server's been restarted, agent still can't see the tools, therefore the server must be broken. Every signal pointed at the MCP server — and every signal was coming through the same malformed probe, so they were all the same false negative wearing different hats.

Stop reasoning through the agent. Isolate the layer, then reproduce the exact production invocation.

Spawn the server fresh, outside any agent session, and ask it to enumerate its tools. Most runtimes ship a connectivity check that does exactly this; the vendor-neutral tool is the MCP Inspector.

If your runtime has a built-in check (generic shape — it spawns the server fresh, bypassing the session):

<runtime> mcp test <server>

Or talk to the server directly with the MCP Inspector — no runtime involved at all. The CLI mode is perfect for a connectivity assertion:

npx @modelcontextprotocol/inspector --cli \
  <your-server-launch-command> --method tools/list

(npx @modelcontextprotocol/inspector

with no --cli

opens the browser UI at http://localhost:6274

— connect, open the Tools tab, click List Tools.)

The output settles it immediately:

Connected.
server: signalk          → 7 tools discovered: read_sensor, battery_state,
                            get_route, get_active_alarms, ...
server: vessel_knowledge → 4 tools discovered: find_equipment, get_equipment, ...

Both servers connect and enumerate every tool. The servers were fine the entire time — through all three dead-ends. That one command would have saved the gateway restart and every ad-hoc re-probe.

If the server lists its tools standalone, the bug (if any) is in how the session loads them, not the server. So stop hand-typing variants and run the agent the way production runs it — same skill/profile selector, same flags, same transport.

Our production path (a voice/MQTT bridge) launches the agent in oneshot mode with a namespaced skill selector and the query passed via the query flag. The malformed probe had used a bare, non-namespaced selector and the wrong flag — which loads a different toolset (or none of the stdio toolsets). The shape that matters:

<runtime> chat -m <model> -z "use the new tool"

<runtime> chat -Q -s profile/<skill> -q "use the new tool"

Run the right one and the agent calls the tools on the first try — no restart, no file-reading workaround:

[tool] get_active_alarms → none active
[tool] find_equipment("Bellmarine") → bellmarine-ddw-10
... rated temperature zones: ...

The fix was a one-line change to the invocation. The server never moved.

The one rule that would have short-circuited all of it: don't restart a daemon to "refresh tools" before the server is proven stale. Prove it with a standalone mcp test

/ Inspector run first.

The order is the whole lesson: server (standalone) → invocation (exact production form) → only then the runtime. Don't bounce the daemon until the first two are ruled out.

This came out of wiring MCP servers into a ship's-computer agent for an all-electric charter catamaran — SignalK, vessel-knowledge, and tide/weather tools behind a local LLM. When the agent said it couldn't see them, the servers had been fine all along; the probe was wrong. The MCP servers behind it are open source: github.com/sailingnaturali/signalk-mcp.

source & further reading

dev.to — original article Tailwind laid off 75% of engineering. AI didn't do that, shadcn did. How I built a browser-side background remover (and benchmarked Canvas vs WebAssembly) The AI Engineering Tools Landscape — Mid-2026

When MCP tools break, isolate the server before blaming the agent

Run your AI side-project on zahid.host