Build MCP Servers that don't suck...tokens. First-generation MCP servers, which wrapped REST APIs for tools like Jira and GitHub, caused excessive token usage and context bloat. It introduces the "ultra-mcp-toolkit," which uses strategies like trimming response fields, consolidating multiple tools into one, and using a CLI-based approach to dramatically reduce token consumption—for example, dropping a Jira ticket payload from 270 KB to 15.5 KB and tool listings from ~10k tokens to ~100 tokens. The toolkit also provides a Claude Code skill to automate these optimizations, making MCP servers more cost-efficient and reducing hallucinations. First-generation MCP servers were great. They gave AI agents access to a ton of external apps and data — Jira, Confluence, GitHub, Linear, you name it. But most of them just wrapped REST APIs. And that causes a ton of context bloat, hallucinations, and token burning. Combining a few strategies from the ultra-mcp-toolkit https://github.com/scottlepp/ultra-mcp-toolkit , you can reduce that bloat dramatically — and save money. Generating a cost-efficient MCP server is easy. Just install the skill and off you go. Here's what "dramatically" looks like Real benchmark, live Jira instance, reproducible https://github.com/scottlepp/ultra-jira-mcp/blob/main/docs/BENCHMARK.md : Per-call response size | scenario | naive | with toolkit | savings | |---|---|---|---| | fetch 1 simple ticket | 20.3KB | 1.2KB | 17.5× | | investigate rich ticket | 270.7KB | 15.5KB | 17.5× | | JQL search ~10 tickets | 20.5KB | 3.5KB | 5.8× | That rich-ticket row is the one that hurts. 270 KB → 15.5 KB. ~67k tokens down to ~3.9k tokens. Same content; the full payload still lands on disk and the agent can fetch it via a ref: path only if it actually needs the detail. Tool-list cost paid every conversation | approach | bytes | ~tokens | savings | |---|---|---|---| | naive one tool per op | 38.9KB | 9,947 | 1× | | consolidated tools | 25.1KB | 6,427 | 1.5× | | consolidated + filtered | ~6 KB | ~1,600 | 5× | code-api mode | 401B | 100 | 99× | You read that right. Tool listings drop from ~10k tokens to ~100 tokens. On every. single. conversation. Why MCP servers leak tokens Four anti-patterns show up almost everywhere: - Returning raw API JSON. A Jira issue carries iconUrl s, nested self URLs, schema metadata, expand hints, three different shapes of the same status field. The agent needs none of it. - One MCP tool per endpoint. A typical CRM has ~80 endpoints → 80 tool descriptions in the listing → ~10k tokens before the user types anything. - Asking the LLM to filter or paginate. The model can't reliably page through huge structures, and the chunking logic itself costs tokens. Filtering belongs server-side. - No discipline on what gets kept. Denylist trimming delete result.iconUrl silently breaks the day the API adds a new noisy field. Allowlists keep the contract stable. The fix, in three strategies 1. Allowlist-style trim projections js import { pick } from "ultra-mcp-toolkit/trim"; const issueSummary = raw = { const r = raw as { key: string; fields: Record