First-generation MCP servers were great. They gave AI agents access to a ton of external apps and data β Jira, Confluence, GitHub, Linear, you name it. But most of them just wrapped REST APIs. And that causes a ton of context bloat, hallucinations, and token burning.
Combining a few strategies from the ultra-mcp-toolkit, you can reduce that bloat dramatically β and save money.
Generating a cost-efficient MCP server is easy. Just install the skill and off you go.
Here's what "dramatically" looks like #
Real benchmark, live Jira instance, reproducible:
Per-call response size
| scenario | naive | with toolkit | savings |
|---|---|---|---|
| fetch 1 simple ticket | 20.3KB | 1.2KB | 17.5Γ |
| investigate rich ticket | 270.7KB | 15.5KB | 17.5Γ |
| JQL search ~10 tickets | 20.5KB | 3.5KB | 5.8Γ |
That rich-ticket row is the one that hurts. 270 KB β 15.5 KB. ~67k tokens down to ~3.9k tokens. Same content; the full payload still lands on disk and the agent can fetch it via a ref:
path only if it actually needs the detail.
Tool-list cost (paid every conversation)
| approach | bytes | ~tokens | savings |
|---|---|---|---|
| naive (one tool per op) | 38.9KB | 9,947 | 1Γ |
| consolidated tools | 25.1KB | 6,427 | 1.5Γ |
| consolidated + filtered | ~6 KB | ~1,600 | 5Γ |
| code-api mode | |||
| 401B | 100 | 99Γ |
You read that right. Tool listings drop from ~10k tokens to ~100 tokens. On every. single. conversation.
Why MCP servers leak tokens #
Four anti-patterns show up almost everywhere:
Returning raw API JSON. A Jira issue carriesiconUrl
s, nestedself
URLs, schema metadata, expand hints, three different shapes of the same status field. The agent needs none of it. -
One MCP tool per endpoint. A typical CRM has ~80 endpoints β 80 tool descriptions in the listing β ~10k tokens before the user types anything. -
Asking the LLM to filter or paginate. The model can't reliably page through huge structures, and the chunking logic itself costs tokens. Filtering belongs server-side. -
No discipline on what gets kept. Denylist trimming (delete result.iconUrl
) silently breaks the day the API adds a new noisy field. Allowlists keep the contract stable.
The fix, in three strategies #
1. Allowlist-style trim projections
import { pick } from "ultra-mcp-toolkit/trim";
const issueSummary = (raw) => {
const r = raw as { key: string; fields: Record<string, unknown> };
return {
key: r.key,
...pick(r.fields, ["summary", "status", "priority", "assignee"]),
};
};
Register the trim once. Every response routes through it. New API fields default to dropped. The model sees what it needs; the full response lives on disk as a ref:
the agent can dereference on demand.
2. Consolidated tools (action-discriminated)
Instead of 80 tools, expose ~15 β each taking an action
arg:
{ action: "get", issueIdOrKey: "PROJ-1" }
{ action: "create", projectKey: "PROJ", summary: "..." }
{ action: "transition", issueIdOrKey: "PROJ-1", transition: "Done" }
Same operations, 1/5th the tool-list cost. The toolkit's dispatcher handles per-action Zod validation, manifest routing, and a full: true
escape hatch when the model genuinely needs the raw response.
3. Code-api mode (the 99Γ lever)
Expose a single MCP tool that hands the agent a path to a bundled CLI plus a socket address:
node <cli-path> issue.get --issueIdOrKey=PROJ-1
The agent drives the whole API from its shell. Tool list stays at one tool forever, no matter how many operations exist. For shell-capable agents (Claude Code, Cursor, anything with bash), it's pure win.
Quick start #
npm install ultra-mcp-toolkit
The toolkit ships a Claude Code skill that auto-loads when you work on an MCP server. Install it:
npm run install-skill
That's it. The skill walks the agent through manifest design, trim projections, dispatcher wiring, and server boot β the patterns that produce the numbers above.
Working from a non-Claude agent (Codex CLI, Cursor, Aider, Continue, Zed)? Point it at the skill markdown directly β AGENTS.md shows you how.
What's in the box #
Operation manifestβ declare endpoints as pure data; powers MCP tools, CLI, and code-api bridge from one source of truth. -
Trim registryβ type-safe allowlist projections. -
Content-addressed sandboxβ full responses land on disk; the model sees aref:
only. -
Page cacheβ versioned-id disk cache for stable keys (PR diffs by SHA, Confluence pages by version). -
Pooled retry-aware HTTP transportβundici
- 429-aware retry honoring
Retry-After
. -
Atomic streaming downloadsβ sha256-verified, path-traversal-safe. -
Consolidated tool dispatcherβ Zod-validated, action-discriminated. -
CLI scaffoldingβ bridge mode + direct mode, free withcreateCli
. - Bundled Claude Code skillβ installs in one command.
Production proof #
Used in ultra-jira-mcp and ultra-bitbucket-mcp. The benchmark numbers above come from the Jira server running against a real Jira Cloud instance β every byte measured is one a production agent would actually receive.
If you're building an MCP server for any enterprise API β Jira, Confluence, GitHub, Linear, Notion, ServiceNow, Salesforce, whatever β and your token bill or context window is starting to bite, give it a try.
β ** github.com/scottlepp/ultra-mcp-toolkit** β issues, PRs, and benchmark contributions welcome.
What's the most token-bloated MCP server you've shipped or seen? Drop it in the comments β I'm collecting horror stories.