Measure Your MCP Server's Token Tax in 60 Seconds A developer has created a 60-second audit script to measure the "MCP server token tax"—the context budget consumed by tool definitions before an agent performs any useful work. Running the script against the real filesystem MCP server revealed 14 tools consuming 2,638 tokens, approximately 1.3% of a 200K context window. The measurement, performed using `tiktoken`'s `o200k_base` encoding, aims to provide developers with actual per-tool costs rather than relying on repeated industry figures. The MCP server token tax is the context budget every tool definition eats before your agent does a single useful thing. To measure it, pull the server's tools/list JSON and tokenize each definition. Claude Code's Tool Search defers loading — it doesn't reduce the tax. Run the 60-second audit below and you'll see your real per-tool cost instead of repeating someone else's number. In short: the MCP server token tax is the context budget every tool definition eats before your agent does anything. To measure it, pull the server's tools/list and tokenize each definition with tiktoken . My run of the real filesystem server: 14 tools, 2,638 tokens, ~1.3% of a 200K window. AI disclosure:I wrote mcp token tax.py with AI assistance and ran it myself before publishing. Every number below is pasted from a real run of that script, or it's an external figure with a dated link next to it. I label which is which. You've seen the figure quoted everywhere this spring: "the GitHub MCP server costs you tens of thousands of tokens before you ask anything." It gets repeated in threads, in newsletters, in conference hallway chatter. Here's a question almost nobody answers when they quote it: with which tokenizer, against which tools/list? I didn't want to repeat a number. I wanted to measure one. So I drove the real, published filesystem MCP server, captured its actual tools/list , and counted. The answer surprised me, and it's the reason this post exists. TL;DR @modelcontextprotocol/server-filesystem server: tiktoken . Copy it, run it, audit your own stack.This is the first post in a small thread on MCP FinOps: measure before you cut . It sits next to the control side of my work: a hard spend-cap that stops a runaway agent loop https://finops.spinov.online/blog/a-47k-agent-loop-spend-cap/ and the pre-execution gate that refuses a bad agent action before it runs https://finops.spinov.online/blog/pre-execution-gate-for-ai-agents/ . Those stop bad actions. This one just gives you a number, because you can't cut what you haven't measured. A tool definition is text. Name, title, a human-readable description, the JSON Schema for its inputs and now, often, an output schema and annotations . When you connect an MCP server, the host serializes all of that and injects it into the model's context so it knows the tool exists and how to call it. That text doesn't get charged once. It rides along on turn after turn, because the model has to keep "seeing" the tools to use them. Ten tools you never call still sit in the window, quietly, on every request. That's the tax: rent on capability you've declared but may not be using. Two costs come out of it. The obvious one is dollars: input tokens you pay for repeatedly. The sneakier one is room . Every token of definition is a token not available for the actual conversation, the retrieved docs, the file you pasted. The MCP spec is moving toward a stateless core in the 2026-07-28 release candidate https://blog.modelcontextprotocol.io/posts/2026-07-28-release-candidate/ , which reshapes a lot — but it doesn't change the basic physics here. Definitions still have to reach the model somehow. Ken Alger named the downstream symptom plainly in his March 2026 piece on multi-agent MCP https://www.kenwalger.com/blog/ai/mcp-multi-agent-orchestration-forensics/ : "A single agent juggling too many tools often suffers from… Tool Confusion: choosing the wrong function when multiple tools are available," plus "Latency and Cost." Tokens are one face of that. Accuracy is the other. Anthropic's own testing in that same November post showed a model's tool-selection accuracy climbing from 49% to 74% once it stopped carrying every definition at once. Fewer tools in context, better choices. The tax isn't only financial. Here's the whole thing. It does one job: read a tools/list , tokenize each tool with tiktoken 's o200k base encoding the gpt-4o family encoding — swap in cl100k base for older models , and print a per-tool table with the share of your context window and a dollars-per-round estimate. You feed it tools two ways. Point it at a published stdio server and it'll do the JSON-RPC handshake and capture the real tools/list live — keyless, read-only, and it never calls tools/call , so nothing executes. Or hand it a JSON fixture you saved earlier, for a deterministic run that reproduces byte-for-byte. bash /usr/bin/env python3 """mcp token tax.py - measure the token tax of an MCP server's tool definitions.""" import argparse, json, subprocess, sys, threading try: import tiktoken except ImportError: sys.exit "tiktoken is required: pip install tiktoken" def serialize tool tool: dict - str: The text a host puts in context for one tool, as compact JSON. Hosts frame this differently, so it's a close approximation, not a provider's billing meter. Counted the same way for every tool, so the ranking and relative shares hold even where the absolute number drifts. return json.dumps tool, ensure ascii=False, separators= ",", ":" def measure tools, encoding name="o200k base" : enc = tiktoken.get encoding encoding name rows = for t in tools: rows.append { "name": t.get "name", "