{"slug": "build-mcp-servers-that-don-t-suck-tokens", "title": "Build MCP Servers that don't suck...tokens.", "summary": "First-generation MCP servers, which wrapped REST APIs for tools like Jira and GitHub, caused excessive token usage and context bloat. It introduces the \"ultra-mcp-toolkit,\" which uses strategies like trimming response fields, consolidating multiple tools into one, and using a CLI-based approach to dramatically reduce token consumption—for example, dropping a Jira ticket payload from 270 KB to 15.5 KB and tool listings from ~10k tokens to ~100 tokens. The toolkit also provides a Claude Code skill to automate these optimizations, making MCP servers more cost-efficient and reducing hallucinations.", "body_md": "First-generation MCP servers were great. They gave AI agents access to a ton of external apps and data — Jira, Confluence, GitHub, Linear, you name it. But most of them just wrapped REST APIs. And that causes a ton of context bloat, hallucinations, and token burning.\n\nCombining a few strategies from the [ultra-mcp-toolkit](https://github.com/scottlepp/ultra-mcp-toolkit), you can reduce that bloat dramatically — and save money.\n\nGenerating a cost-efficient MCP server is easy. Just install the skill and off you go.\n\n## Here's what \"dramatically\" looks like\n\nReal benchmark, live Jira instance, [reproducible](https://github.com/scottlepp/ultra-jira-mcp/blob/main/docs/BENCHMARK.md):\n\n### Per-call response size\n\n| scenario | naive | with toolkit | savings |\n|---|---|---|---|\n| fetch 1 simple ticket | 20.3KB | 1.2KB | 17.5× |\n| investigate rich ticket | 270.7KB | 15.5KB | 17.5× |\n| JQL search ~10 tickets | 20.5KB | 3.5KB | 5.8× |\n\nThat rich-ticket row is the one that hurts. **270 KB → 15.5 KB.** ~67k tokens down to ~3.9k tokens. Same content; the full payload still lands on disk and the agent can fetch it via a `ref:`\n\npath only if it actually needs the detail.\n\n### Tool-list cost (paid every conversation)\n\n| approach | bytes | ~tokens | savings |\n|---|---|---|---|\n| naive (one tool per op) | 38.9KB | 9,947 | 1× |\n| consolidated tools | 25.1KB | 6,427 | 1.5× |\n| consolidated + filtered | ~6 KB | ~1,600 | 5× |\ncode-api mode |\n401B | 100 | 99× |\n\nYou read that right. Tool listings drop from ~10k tokens to ~100 tokens. On every. single. conversation.\n\n## Why MCP servers leak tokens\n\nFour anti-patterns show up almost everywhere:\n\n-\n**Returning raw API JSON.** A Jira issue carries`iconUrl`\n\ns, nested`self`\n\nURLs, schema metadata, expand hints, three different shapes of the same status field. The agent needs none of it. -\n**One MCP tool per endpoint.** A typical CRM has ~80 endpoints → 80 tool descriptions in the listing → ~10k tokens before the user types anything. -\n**Asking the LLM to filter or paginate.** The model can't reliably page through huge structures, and the chunking logic itself costs tokens. Filtering belongs server-side. -\n**No discipline on what gets kept.** Denylist trimming (`delete result.iconUrl`\n\n) silently breaks the day the API adds a new noisy field. Allowlists keep the contract stable.\n\n## The fix, in three strategies\n\n### 1. Allowlist-style trim projections\n\n``` js\nimport { pick } from \"ultra-mcp-toolkit/trim\";\n\nconst issueSummary = (raw) => {\n  const r = raw as { key: string; fields: Record<string, unknown> };\n  return {\n    key: r.key,\n    ...pick(r.fields, [\"summary\", \"status\", \"priority\", \"assignee\"]),\n  };\n};\n```\n\nRegister the trim once. Every response routes through it. New API fields default to *dropped*. The model sees what it needs; the full response lives on disk as a `ref:`\n\nthe agent can dereference on demand.\n\n### 2. Consolidated tools (action-discriminated)\n\nInstead of 80 tools, expose ~15 — each taking an `action`\n\narg:\n\n```\n{ action: \"get\", issueIdOrKey: \"PROJ-1\" }\n{ action: \"create\", projectKey: \"PROJ\", summary: \"...\" }\n{ action: \"transition\", issueIdOrKey: \"PROJ-1\", transition: \"Done\" }\n```\n\nSame operations, 1/5th the tool-list cost. The toolkit's dispatcher handles per-action Zod validation, manifest routing, and a `full: true`\n\nescape hatch when the model genuinely needs the raw response.\n\n### 3. Code-api mode (the 99× lever)\n\nExpose a *single* MCP tool that hands the agent a path to a bundled CLI plus a socket address:\n\n```\nnode <cli-path> issue.get --issueIdOrKey=PROJ-1\n# stdout: trimmed summary as JSON\n# final line: `ref: /path/to/full-response.json`\n```\n\nThe agent drives the whole API from its shell. Tool list stays at one tool forever, no matter how many operations exist. For shell-capable agents (Claude Code, Cursor, anything with bash), it's pure win.\n\n## Quick start\n\n```\nnpm install ultra-mcp-toolkit\n```\n\nThe toolkit ships a Claude Code skill that auto-loads when you work on an MCP server. Install it:\n\n```\nnpm run install-skill\n```\n\nThat's it. The skill walks the agent through manifest design, trim projections, dispatcher wiring, and server boot — the patterns that produce the numbers above.\n\nWorking from a non-Claude agent (Codex CLI, Cursor, Aider, Continue, Zed)? Point it at the skill markdown directly — [ AGENTS.md](https://github.com/scottlepp/ultra-mcp-toolkit/blob/main/AGENTS.md) shows you how.\n\n## What's in the box\n\n-\n**Operation manifest**— declare endpoints as pure data; powers MCP tools, CLI, and code-api bridge from one source of truth. -\n**Trim registry**— type-safe allowlist projections. -\n**Content-addressed sandbox**— full responses land on disk; the model sees a`ref:`\n\nonly. -\n**Page cache**— versioned-id disk cache for stable keys (PR diffs by SHA, Confluence pages by version). -\n**Pooled retry-aware HTTP transport**—`undici`\n\n+ 429-aware retry honoring`Retry-After`\n\n. -\n**Atomic streaming downloads**— sha256-verified, path-traversal-safe. -\n**Consolidated tool dispatcher**— Zod-validated, action-discriminated. -\n**CLI scaffolding**— bridge mode + direct mode, free with`createCli`\n\n. -\n**Bundled Claude Code skill**— installs in one command.\n\n## Production proof\n\nUsed in [ultra-jira-mcp](https://github.com/scottlepp/ultra-jira-mcp) and [ultra-bitbucket-mcp](https://github.com/scottlepp/ultra-bitbucket-mcp). The benchmark numbers above come from the Jira server running against a real Jira Cloud instance — every byte measured is one a production agent would actually receive.\n\nIf you're building an MCP server for any enterprise API — Jira, Confluence, GitHub, Linear, Notion, ServiceNow, Salesforce, whatever — and your token bill or context window is starting to bite, give it a try.\n\n⭐ ** github.com/scottlepp/ultra-mcp-toolkit** — issues, PRs, and benchmark contributions welcome.\n\nWhat's the most token-bloated MCP server *you've* shipped or seen? Drop it in the comments — I'm collecting horror stories.", "url": "https://wpnews.pro/news/build-mcp-servers-that-don-t-suck-tokens", "canonical_source": "https://dev.to/scottlepp/build-mcp-servers-that-dont-sucktokens-im2", "published_at": "2026-05-19 00:27:45+00:00", "updated_at": "2026-05-19 01:03:52.080162+00:00", "lang": "en", "topics": ["developer-tools", "artificial-intelligence", "large-language-models", "open-source"], "entities": ["MCP", "Jira", "Confluence", "GitHub", "Linear", "ultra-mcp-toolkit"], "alternates": {"html": "https://wpnews.pro/news/build-mcp-servers-that-don-t-suck-tokens", "markdown": "https://wpnews.pro/news/build-mcp-servers-that-don-t-suck-tokens.md", "text": "https://wpnews.pro/news/build-mcp-servers-that-don-t-suck-tokens.txt", "jsonld": "https://wpnews.pro/news/build-mcp-servers-that-don-t-suck-tokens.jsonld"}}