cd /news/large-language-models/the-mcp-tax-hit-42000-tokens-on-a-si… · home topics large-language-models article
[ARTICLE · art-43573] src=dev.to ↗ pub= topic=large-language-models verified=true sentiment=· neutral

The MCP Tax Hit 42,000 Tokens on a Single Server. Here's What I Did About It.

An engineer discovered that MCP server schemas consumed 42,000 tokens on a single server, causing a 37% token bill increase. After instrumenting a production agent, they found that schema overhead accounted for up to 80% of the context window. Four optimizations—lazy schema loading, schema compression, tool allowlisting, and server-side filtering—reduced token usage by 67% and cut costs by 69%.

read5 min views1 publishedJun 29, 2026

I connected an MCP server last month and watched my token bill jump 37% on the first call. The actual work? A single git status

. The schema for that one server consumed 42,000 tokens before the model typed a single character.

That's not a typo. Forty-two thousand.

If you ship AI agents in 2026 and you're not measuring MCP overhead, you're leaving real money on the table. Here's what I found when I actually instrumented the tax — and four patterns that brought my bill back under control.

MCP (Model Context Protocol) defines a JSON-RPC handshake where every connected server pushes its full tool schema into the model's context window. The model needs those definitions to know what tools exist and how to call them. The protocol is clean. The economics are not.

When you connect N

servers with an average of M

tools each, you pay N × M × schema_size

tokens on every single request — including requests that use zero of those tools. The schemas don't shrink when the model ignores them. They don't get paged out. They sit there, eating context, until the conversation ends.

In a 200,000-token window, four "modest" servers can burn 21% of your budget before the agent says hello.

The standard MCP lifecycle looks like this:

1. Initialize          → server returns protocol version, capabilities
2. ListTools           → server returns full tool schema array
3. CallTool (per call) → model picks a tool, sends args, gets result
4. (repeat)

Steps 1 and 2 happen on every session. Step 2 is where the tax lives.

I instrumented a production agent over a week in June 2026. Same model (Claude Opus 4.5), same workloads, only the tool wiring changed. Here's what came out:

Configuration Avg tokens/request Schema overhead Cost per 1k req
No MCP, CLI tools only 18,400 0 $0.92
1 server (GitHub), 12 tools 60,200 42,000 (70%) $3.01
1 server + Playwright 81,500 63,000 (77%) $4.08
4 servers, 47 tools 178,300 142,000 (80%) $8.92
After 4 fixes (below) 56,100 21,000 (37%) $2.81

The single-server case was the worst per-tool: GitHub MCP alone added 42,000 tokens of schema to a conversation that needed maybe 800 tokens of real GitHub work. Across 1,000 requests/day, that was an extra $2,000/month for nothing useful.

The 4-server case was even more dramatic — 80% of the context window was definitions the model barely used.

I tried eight different optimizations. Four moved the needle. The other four were theater.

Most MCP clients eagerly call ListTools

at session start. I patched the client to call it only when the model's first turn references the server's domain (filesystem paths, github.com

, etc.).

await mcp_client.initialize_all_servers()

async def get_schema(server):
    if not context_hints.match(server.domain_keywords):
        return None  # skip ListTools
    return await mcp_client.list_tools(server)

Result: server schemas load only when relevant. Idle requests now pay ~0 tokens of MCP overhead. Saved 38% of total tokens in my mixed workload.

MCP tool schemas ship with rich descriptions written for humans browsing the tool list. The model doesn't need three paragraphs of "Use this tool when you want to…" prose. It needs name, parameter types, and one sentence of intent.

I wrote a post-processor that:

Schema size per tool: ~3,500 chars → ~800 chars. Across 47 tools, that's a 67% schema shrink with zero measurable drop in tool-selection accuracy.

MCP servers usually expose more tools than any one agent needs. The GitHub MCP ships 70+ tools — I use 8. I added a server-side allowlist:

{
  "mcpServers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": { "GITHUB_TOOL_ALLOWLIST": "create_issue,list_issues,get_issue,add_issue_comment,search_code,get_file_contents,list_pull_requests,merge_pull_request" }
    }
  }
}

The server only returns the allowlisted tools on ListTools

. 8 tools × ~1,200 tokens each = 9,600 tokens vs 70 tools × 3,500 = 245,000 tokens.

This one hurt my pride. I built a beautiful MCP server for filesystem operations. It was elegant. It was also expensive.

For raw reads — cat

, grep

, ls

, head

— a CLI invocation through bash

costs ~200 tokens of tool definition vs ~2,800 tokens for the MCP equivalent. The model doesn't need structured output for grep -c "TODO" src/*.ts

.

I kept the MCP server for stateful operations (multi-step transactions, authenticated API calls) and routed everything else through bash. 32x token reduction on the high-frequency path.

Worth listing so you don't waste a weekend like I did:

MCP is the right protocol. The ecosystem is moving fast (14,000+ servers as of mid-2026, governance transferred to Linux Foundation's AAIF). The community is shipping fixes — MCP Gateway tools now do lazy schema server-side, and several large servers have shipped "minimal" modes.

But today, in production, the tax is real. Measure it before you optimize it. Run one of your agents with tiktoken

counting the schema payload vs the request payload, and you'll know your number.

Mine was 70% before. It's 37% now. The next 20 percentage points are going to come from server-side improvements I can't make alone — but until the ecosystem catches up, lazy- + description trimming + allowlists + selective CLI routing is the most defensible stack.

If you're shipping MCP today without measuring this, you're flying blind on cost. The protocol will improve. Your bill won't wait.

── more in #large-language-models 4 stories · sorted by recency
── more on @mcp 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/the-mcp-tax-hit-4200…] indexed:0 read:5min 2026-06-29 ·