MCP Server Design: 3 Principles We Learned in Production

wpnews.pro

Exposing a tool to an agent over MCP takes ten minutes. Building an MCP server that survives a model you don't control, on a tight token budget with limited thinking time, is the part nobody warns you about.

We learned the difference shipping our own, consumed by third-party agents whose models we don't pick. Three principles came out of it, each one we only fully believed after it broke in production:

TL;DR — three MCP server best practices from our trenches:

We've been iterating on Trent's MCP server; one public-facing surface for the product, consumed by third-party agents whose models we don't control. Each iteration taught us something we'd half-believed going in but only fully internalized after it broke. These three principles have crystallized from that work, and they cut against the grain of how it feels to build a server when you're moving fast. None of these are subtle in hindsight.

The instinct from regular software design, small composable units, single responsibility, doesn't transfer cleanly to MCP. The consumer of the surface is an LLM with a finite attention budget, not another piece of software. The right size tool is the workflow, the agent is actually performing, not the smallest atomic operation in the underlying API.

Two reasons we've been aggressive about consolidation:

Consolidating also tightens the loop for us as engineers. Fewer tools means a smaller surface to test, a smaller set of failure modes to observe, and a more direct path from a customer issue to the tool that caused it. The product gets simpler for the user, the workflow gets simpler for the model, and the codebase gets simpler for us. That alignment is rare; when you can find it, take it.

Concretely: we took our own MCP server from 17 tools down to 11, and the result was visibly better tool usage across the workflows that had been giving us trouble. The model spent fewer cycles on tool selection and the failure modes we were seeing on tighter constraints largely cleaned up. The current published version is trentai-mcp on PyPI.

The push to make this cut came from a pre-launch integration where Trent was exposed to end users through a third party's chat interface. During testing we kept hitting cases where the chat couldn't follow our instructions reliably, and tool overlap turned out to be a major contributor.

MCP tool wording across the input schema, output schema, and the output values of every tool on a server needs to be consistent. If one tool calls a field user_id and another calls the same thing customer_id and a third returns accountId, the model has to reconcile that on every call. It mostly does, but reconciliation costs tokens, introduces ambiguity, and shows up as flaky tool calls in unpredictable conditions.

This matters more than it sounds because you don't always control the model on the other side of the wire. When the MCP server is consumed by a third party, the agent could be running on a small model with a tight token budget and limited thinking time. Inconsistent naming that a frontier model would reason past, a smaller model just fails on. The same surface that looks fine in development collapses in a deployment you can't see.

We ran into this during the same third-party pre-launch integration mentioned above. We exposed an update_tasks tool that let the chat write progress into a Trent security assessment, but the underlying API used control_id for the response field name and task_id for the input field name. The chat got confused between the two, the tool call failed repeatedly, and it couldn't debug its way out. We didn't catch this right away either; the 422s we kept seeing looked like a service-side bug, and we'd been debugging on the service end for a while before realizing the failure was upstream of the API, in the chat's tool call. Making the naming consistent across input, output, and value cleared it up.

The frame I've started landing on is simple: the model on the other side of the wire is a variable you don't get to pick. So design the surface for the lowest common denominator (consumer) that matters. Capable models reason past inconsistent naming; smaller ones fail on it. Consistency costs you one round of cleanup before you ship; inconsistency gets paid by every consumer, every call, forever.

This is the principle I'd most like to have learned sooner.

We built the MCP server with an agent. It worked. The tests the agent wrote alongside the implementation passed, our engineer-driven dogfooding ran cleanly, and the manual testing we did in the workflows we cared about all came back green. Beyond the tool selection and naming problems we covered earlier, we kept hitting a different class of failure that we couldn't reproduce locally: the agent getting input shape wrong, invoking the tool in ways that didn't match what we'd documented at all.

When we looked under the hood, the implementation hadn't actually defined input and output schemas in the JSON properties the MCP protocol specifies. The agent that wrote the server had instead stuffed the entire contract, input shape, output shape, examples, into the description string of the tool, as a long comment-like blob. Frontier models read that and inferred the right structure. Smaller models, with less budget for inference, couldn't. The fix is structural. MCP inputSchema and outputSchema are contracts, not hints. Stuffing them into the description string opts you out of every guarantee the protocol gives you.

Two lessons from that, both worth saying out loud:

The server I've been describing — trentai-mcp — is how Trent shows up inside Claude Code. It runs the full Scan → Judge → Mitigate → Evaluate loop in your editor: surfacing threats relevant to your application's architecture, prioritizing them against the real risk profile, generating a remediation plan that becomes tasks Claude Code can implement, and tracking how your security posture changes session over session.

MCP is still young, and the patterns for designing servers well are still being worked out across the industry. The three principles above are real world examples of what we've learned in production, and these principles are what I'd share with a new teammate, on day one when building a new server.

Originally published on the Trent AI blog — the full piece includes the worked example of the four consolidated tools.

source & further reading

dev.to — original article Local RAG Over Audit Reports: Searching Five Years of Vulnerabilities Offline What happens to your app when your LLM provider goes down Soul in Motion — 8:22 PM | 2026-08-02

MCP Server Design: 3 Principles We Learned in Production

Run your AI side-project on zahid.host