I connected an MCP server last month and watched my token bill jump 37% on the first call. The actual work? A single git status
. The schema for that one server consumed 42,000 tokens before the model typed a single character.
That's not a typo. Forty-two thousand.
If you ship AI agents in 2026 and you're not measuring MCP overhead, you're leaving real money on the table. Here's what I found when I actually instrumented the tax — and four patterns that brought my bill back under control.
MCP (Model Context Protocol) defines a JSON-RPC handshake where every connected server pushes its full tool schema into the model's context window. The model needs those definitions to know what tools exist and how to call them. The protocol is clean. The economics are not.
When you connect N
servers with an average of M
tools each, you pay N × M × schema_size
tokens on every single request — including requests that use zero of those tools. The schemas don't shrink when the model ignores them. They don't get paged out. They sit there, eating context, until the conversation ends.
In a 200,000-token window, four "modest" servers can burn 21% of your budget before the agent says hello.
The standard MCP lifecycle looks like this:
1. Initialize → server returns protocol version, capabilities
2. ListTools → server returns full tool schema array
3. CallTool (per call) → model picks a tool, sends args, gets result
4. (repeat)
Steps 1 and 2 happen on every session. Step 2 is where the tax lives.
I instrumented a production agent over a week in June 2026. Same model (Claude Opus 4.5), same workloads, only the tool wiring changed. Here's what came out:
| Configuration | Avg tokens/request | Schema overhead | Cost per 1k req |
|---|---|---|---|
| No MCP, CLI tools only | 18,400 | 0 | $0.92 |
| 1 server (GitHub), 12 tools | 60,200 | 42,000 (70%) | $3.01 |
| 1 server + Playwright | 81,500 | 63,000 (77%) | $4.08 |
| 4 servers, 47 tools | 178,300 | 142,000 (80%) | $8.92 |
| After 4 fixes (below) | 56,100 | 21,000 (37%) | $2.81 |
The single-server case was the worst per-tool: GitHub MCP alone added 42,000 tokens of schema to a conversation that needed maybe 800 tokens of real GitHub work. Across 1,000 requests/day, that was an extra $2,000/month for nothing useful.
The 4-server case was even more dramatic — 80% of the context window was definitions the model barely used.
I tried eight different optimizations. Four moved the needle. The other four were theater.
Most MCP clients eagerly call ListTools
at session start. I patched the client to call it only when the model's first turn references the server's domain (filesystem paths, github.com
, etc.).
await mcp_client.initialize_all_servers()
async def get_schema(server):
if not context_hints.match(server.domain_keywords):
return None # skip ListTools
return await mcp_client.list_tools(server)
Result: server schemas load only when relevant. Idle requests now pay ~0 tokens of MCP overhead. Saved 38% of total tokens in my mixed workload.
MCP tool schemas ship with rich descriptions written for humans browsing the tool list. The model doesn't need three paragraphs of "Use this tool when you want to…" prose. It needs name, parameter types, and one sentence of intent.
I wrote a post-processor that:
Schema size per tool: ~3,500 chars → ~800 chars. Across 47 tools, that's a 67% schema shrink with zero measurable drop in tool-selection accuracy.
MCP servers usually expose more tools than any one agent needs. The GitHub MCP ships 70+ tools — I use 8. I added a server-side allowlist:
{
"mcpServers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": { "GITHUB_TOOL_ALLOWLIST": "create_issue,list_issues,get_issue,add_issue_comment,search_code,get_file_contents,list_pull_requests,merge_pull_request" }
}
}
}
The server only returns the allowlisted tools on ListTools
. 8 tools × ~1,200 tokens each = 9,600 tokens vs 70 tools × 3,500 = 245,000 tokens.
This one hurt my pride. I built a beautiful MCP server for filesystem operations. It was elegant. It was also expensive.
For raw reads — cat
, grep
, ls
, head
— a CLI invocation through bash
costs ~200 tokens of tool definition vs ~2,800 tokens for the MCP equivalent. The model doesn't need structured output for grep -c "TODO" src/*.ts
.
I kept the MCP server for stateful operations (multi-step transactions, authenticated API calls) and routed everything else through bash. 32x token reduction on the high-frequency path.
Worth listing so you don't waste a weekend like I did:
MCP is the right protocol. The ecosystem is moving fast (14,000+ servers as of mid-2026, governance transferred to Linux Foundation's AAIF). The community is shipping fixes — MCP Gateway tools now do lazy schema server-side, and several large servers have shipped "minimal" modes.
But today, in production, the tax is real. Measure it before you optimize it. Run one of your agents with tiktoken
counting the schema payload vs the request payload, and you'll know your number.
Mine was 70% before. It's 37% now. The next 20 percentage points are going to come from server-side improvements I can't make alone — but until the ecosystem catches up, lazy- + description trimming + allowlists + selective CLI routing is the most defensible stack.
If you're shipping MCP today without measuring this, you're flying blind on cost. The protocol will improve. Your bill won't wait.