Build MCP Servers that don't suck...tokens.

First-generation MCP servers, which wrapped REST APIs for tools like Jira and GitHub, caused excessive token usage and context bloat. It introduces the "ultra-mcp-toolkit," which uses strategies like trimming response fields, consolidating multiple tools into one, and using a CLI-based approach to dramatically reduce token consumption—for example, dropping a Jira ticket payload from 270 KB to 15.5 KB and tool listings from ~10k tokens to ~100 tokens. The toolkit also provides a Claude Code skill to automate these optimizations, making MCP servers more cost-efficient and reducing hallucinations.

First-generation MCP servers were great. They gave AI agents access to a ton of external apps and data — Jira, Confluence, GitHub, Linear, you name it. But most of them just wrapped REST APIs. And that causes a ton of context bloat, hallucinations, and token burning. Combining a few strategies from the ultra-mcp-toolkit https://github.com/scottlepp/ultra-mcp-toolkit , you can reduce that bloat dramatically — and save money. Generating a cost-efficient MCP server is easy. Just install the skill and off you go. Here's what "dramatically" looks like Real benchmark, live Jira instance, reproducible https://github.com/scottlepp/ultra-jira-mcp/blob/main/docs/BENCHMARK.md : Per-call response size | scenario | naive | with toolkit | savings | |---|---|---|---| | fetch 1 simple ticket | 20.3KB | 1.2KB | 17.5× | | investigate rich ticket | 270.7KB | 15.5KB | 17.5× | | JQL search ~10 tickets | 20.5KB | 3.5KB | 5.8× | That rich-ticket row is the one that hurts. 270 KB → 15.5 KB. ~67k tokens down to ~3.9k tokens. Same content; the full payload still lands on disk and the agent can fetch it via a ref: path only if it actually needs the detail. Tool-list cost paid every conversation | approach | bytes | ~tokens | savings | |---|---|---|---| | naive one tool per op | 38.9KB | 9,947 | 1× | | consolidated tools | 25.1KB | 6,427 | 1.5× | | consolidated + filtered | ~6 KB | ~1,600 | 5× | code-api mode | 401B | 100 | 99× | You read that right. Tool listings drop from ~10k tokens to ~100 tokens. On every. single. conversation. Why MCP servers leak tokens Four anti-patterns show up almost everywhere: - Returning raw API JSON. A Jira issue carries iconUrl s, nested self URLs, schema metadata, expand hints, three different shapes of the same status field. The agent needs none of it. - One MCP tool per endpoint. A typical CRM has ~80 endpoints → 80 tool descriptions in the listing → ~10k tokens before the user types anything. - Asking the LLM to filter or paginate. The model can't reliably page through huge structures, and the chunking logic itself costs tokens. Filtering belongs server-side. - No discipline on what gets kept. Denylist trimming delete result.iconUrl silently breaks the day the API adds a new noisy field. Allowlists keep the contract stable. The fix, in three strategies 1. Allowlist-style trim projections js import { pick } from "ultra-mcp-toolkit/trim"; const issueSummary = raw = { const r = raw as { key: string; fields: Record<string, unknown }; return { key: r.key, ...pick r.fields, "summary", "status", "priority", "assignee" , }; }; Register the trim once. Every response routes through it. New API fields default to dropped . The model sees what it needs; the full response lives on disk as a ref: the agent can dereference on demand. 2. Consolidated tools action-discriminated Instead of 80 tools, expose ~15 — each taking an action arg: { action: "get", issueIdOrKey: "PROJ-1" } { action: "create", projectKey: "PROJ", summary: "..." } { action: "transition", issueIdOrKey: "PROJ-1", transition: "Done" } Same operations, 1/5th the tool-list cost. The toolkit's dispatcher handles per-action Zod validation, manifest routing, and a full: true escape hatch when the model genuinely needs the raw response. 3. Code-api mode the 99× lever Expose a single MCP tool that hands the agent a path to a bundled CLI plus a socket address: node <cli-path issue.get --issueIdOrKey=PROJ-1 stdout: trimmed summary as JSON final line: ref: /path/to/full-response.json The agent drives the whole API from its shell. Tool list stays at one tool forever, no matter how many operations exist. For shell-capable agents Claude Code, Cursor, anything with bash , it's pure win. Quick start npm install ultra-mcp-toolkit The toolkit ships a Claude Code skill that auto-loads when you work on an MCP server. Install it: npm run install-skill That's it. The skill walks the agent through manifest design, trim projections, dispatcher wiring, and server boot — the patterns that produce the numbers above. Working from a non-Claude agent Codex CLI, Cursor, Aider, Continue, Zed ? Point it at the skill markdown directly — AGENTS.md https://github.com/scottlepp/ultra-mcp-toolkit/blob/main/AGENTS.md shows you how. What's in the box - Operation manifest — declare endpoints as pure data; powers MCP tools, CLI, and code-api bridge from one source of truth. - Trim registry — type-safe allowlist projections. - Content-addressed sandbox — full responses land on disk; the model sees a ref: only. - Page cache — versioned-id disk cache for stable keys PR diffs by SHA, Confluence pages by version . - Pooled retry-aware HTTP transport — undici + 429-aware retry honoring Retry-After . - Atomic streaming downloads — sha256-verified, path-traversal-safe. - Consolidated tool dispatcher — Zod-validated, action-discriminated. - CLI scaffolding — bridge mode + direct mode, free with createCli . - Bundled Claude Code skill — installs in one command. Production proof Used in ultra-jira-mcp https://github.com/scottlepp/ultra-jira-mcp and ultra-bitbucket-mcp https://github.com/scottlepp/ultra-bitbucket-mcp . The benchmark numbers above come from the Jira server running against a real Jira Cloud instance — every byte measured is one a production agent would actually receive. If you're building an MCP server for any enterprise API — Jira, Confluence, GitHub, Linear, Notion, ServiceNow, Salesforce, whatever — and your token bill or context window is starting to bite, give it a try. ⭐ github.com/scottlepp/ultra-mcp-toolkit — issues, PRs, and benchmark contributions welcome. What's the most token-bloated MCP server you've shipped or seen? Drop it in the comments — I'm collecting horror stories.