{"slug": "token-optimization-in-the-postman-plugin-for-claude-code", "title": "Token optimization in the Postman plugin for Claude Code", "summary": "The Postman plugin for Claude Code reduced its largest skill's token cost by 60% and overall session overhead by 65% through progressive disclosure and removal of redundant skills, saving roughly 3,600 tokens per session. The optimization addresses context-window degradation in AI coding agents by deferring detailed instructions until needed.", "body_md": "# Token optimization in the Postman plugin for Claude Code\n\nEvery AI coding agent has the same hidden tax: the context window. Anthropic’s guide to [effective context engineering](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) calls context “a critical yet limited resource,” and the research behind it is blunt: as the window fills, model accuracy degrades. The team calls this context rot. Token optimization isn’t only about cost. Every token your tooling injects is a token the model can’t spend reasoning about the user’s actual work.\n\nThat tax lands on plugin authors too. The [Postman plugin for Claude Code](https://github.com/Postman-Devrel/postman-claude-code-plugin) is pure instructional Markdown: commands, skills, and agents that teach Claude the full Postman API lifecycle. There’s no runtime to profile and no binary to shrink. Its entire footprint *is* context-window tokens, paid inside every user’s session.\n\nI recently ran a token-usage review on the plugin and shipped an optimization pass. The headline numbers: the plugin’s largest skill is now **60% lighter** per trigger, the always-on overhead every session pays dropped by **20%**, and a typical “explore an API and generate a client” session starts roughly **3,600 tokens lighter** — about 65% less plugin overhead before any work happens. Here’s where the savings came from.\n\n## Where a Claude Code plugin spends tokens\n\nA plugin built from Markdown spends tokens in three distinct ways, and they’re not equally expensive:\n\n**Always-on cost.** Every skill, command, and agent`description`\n\nin the YAML front matter is injected into*every session’s*system prompt, whether or not the user touches Postman that day. This is the most expensive token in the plugin: every user pays it, every session.**Per-trigger cost.** When Claude decides a skill is relevant, the entire`SKILL.md`\n\nbody loads into context. A 19 KB skill costs roughly 4,800 tokens every time it fires, even if the user only needed a third of it. This layered loading model is documented in the[Claude Code skills docs](https://code.claude.com/docs/en/skills).**Runtime cost.** Tool output, async polling loops, and verbose narration while a command runs. Tool schemas add up fast here too — a[discussion on the MCP specification repo](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/2808)measured roughly 1,000 tokens per complex tool definition.\n\n## 60% lighter skills with progressive disclosure\n\nThe biggest saving came from the per-trigger cost. Anthropic’s engineering team describes [progressive disclosure](https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills) as the foundational pattern of the [Agent Skills standard](https://agentskills.io/): metadata loads at startup, the `SKILL.md`\n\nbody loads when relevant, and bundled reference files load only when a specific step needs them.\n\nA skill doesn’t need to front-load every rule it might ever apply. It needs the workflow, plus pointers to detailed rules that Claude reads at the step that needs them:\n\n```\n## Step 4: Generate the client code\n\nBefore writing any code, read `references/code-generation.md` in this\nskill's directory. It contains the full rule catalog for idiomatic\nclient generation.\n```\n\nWe applied this split to the plugin’s two largest skills:\n\n| Skill | Before | After | Saving per trigger |\n|---|---|---|---|\n`postman-context` |\n~4,760 tokens | ~1,930 tokens | ~2,800 tokens (60%) |\n`generate-spec` |\n~2,640 tokens | ~1,800 tokens | ~840 tokens (32%) |\n\nNo content was deleted. The detailed rules and templates are intact, deferred rather than removed. A user who asks “find me an email API” no longer pays ~2,800 tokens for code-generation rules they aren’t using. A user who does generate code pays the same total as before.\n\n## 100% of the routing skill, gone\n\nThe plugin shipped a `postman-routing`\n\nskill (roughly 835 tokens) whose trigger was “use when user mentions APIs.” That’s broad enough to fire in nearly any backend coding session, Postman-related or not. Its body was a routing table that restated what every command’s `description`\n\nalready tells Claude.\n\nModern Claude Code [routes natively](https://code.claude.com/docs/en/skills) by matching user intent against component descriptions, so the skill was pure duplicate state. We deleted it.\n\n**Saving: ~835 tokens in every session where it fired** — and given that trigger, it fired in most sessions in API codebases.\n\n## 20% off the always-on overhead\n\nSeveral command descriptions enumerated long quoted trigger-phrase lists:\n\n```\ndescription: Run Postman collection tests using Postman CLI - use when\n  user says \"run tests\", \"run collection\", \"run my postman tests\",\n  \"verify changes\", \"check if tests pass\", or wants to execute API\n  test suites after code changes\n```\n\nClaude’s router doesn’t need a phrasebook. It needs the capability and when to use it:\n\n```\ndescription: Run Postman Collection tests with the Postman CLI and\n  report failures. Use after code changes or when the user asks to\n  run API tests.\n```\n\nRewriting these, combined with the routing-skill removal, shrank the always-on description block from 3,182 to 2,562 bytes — a **20% reduction worth ~155 tokens in every session of every plugin user**. That makes them the highest-leverage bytes in the repo. Anthropic’s\n\n[Agent Skills announcement](https://www.anthropic.com/news/skills)makes the same point: discovery metadata works best as one or two tight sentences.\n\n## 90% fewer tool schemas for subagents\n\nEvery MCP-backed command and the plugin’s readiness-analyzer agent previously declared a wildcard, granting access to all 100+ tools on the [Postman MCP Server](https://github.com/postmanlabs/postman-mcp-server). Each now lists exactly what it uses. The readiness analyzer went from 111 tools to 11 — **90% fewer** — and the `setup`\n\ncommand declares six.\n\nFor subagents and clients that resolve tool schemas from the allowlist, that’s an order of magnitude fewer schemas loaded into context. It also keeps the model from wandering into unrelated tools mid-command, which matches Anthropic’s guidance that [overlapping tool sets create ambiguity](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents). As a bonus, the audit surfaced three latent permission bugs where commands were instructed to write or edit files without the permissions to do so. Wildcards hide that class of bug; explicit lists make it visible in review.\n\n## Less chatter from async workflows\n\nSome Postman MCP Server tools return HTTP 202 and require polling for completion. Left to its own devices, the model will happily narrate every poll, and all of it accumulates in context. The affected commands now carry two lines of instruction: poll with increasing waits (2s, then 4s, then 8s), and report only the final outcome. Fewer round-trips, less narration, same result.\n\n## What it adds up to\n\n| Change | Who pays today | Saving |\n|---|---|---|\n| Routing skill removed | Nearly every session in an API codebase | ~835 tokens/session (100% of the skill) |\n| Description trims | Every session, every user | ~155 tokens/session (20% of always-on overhead) |\n`postman-context` split |\nEvery session that triggers the skill | up to ~2,800 tokens/trigger (60%) |\n`generate-spec` split |\nEvery session that triggers the skill | up to ~840 tokens/trigger (32%) |\nScoped `allowed-tools` |\nSubagent spawns; eager-loading clients | 90% fewer schemas loaded |\n| Polling guidance | Long-running async commands | variable; fewer round-trips and less narration |\n\nA typical “explore an API and generate a client” session that previously loaded the routing skill plus the full `postman-context`\n\nskill now starts roughly **3,600 tokens (about 65%) lighter**. A session that never touches Postman saves ~990 tokens it used to spend on routing overhead. We also updated the plugin’s contributor docs so the conventions stick: descriptions stay short, bulky skill content goes in `references/`\n\n, and `allowed-tools`\n\nlists explicit tool names.\n\n## Real-world test: API spec drift detection end-to-end\n\nNumbers in a table are one thing. Watching them play out on a real task is more useful.\n\nTo validate the optimization work, I ran the same prompt against a live GitHub repository using both the pre-optimization and post-optimization versions of the plugin. The task was a non-trivial agentic workflow: scan a workspace for API spec drift, validate the contract, and open a pull request with any code fixes.\n\n```\nlook at my workspace. Identify all API spec drift. Ensure the API contract is valid.\nOpen a PR in https://github.com/buildwithtalia/enterprise-resource-planning\nwith any code fixes.\n```\n\n**Pre-optimization (v1.1.x)**\n\n```\nTotal cost:               $3.60\nTotal duration (API):     15m 52s\nTotal duration (wall):    17m 10s\nTotal code changes:       1159 lines added, 13 lines removed\nUsage by model:\n  claude-haiku-4-5:       104.3k input, 22.8k output, 0 cache read, 0 cache write ($0.2181)\n  claude-sonnet-4-6:      1.8k input, 60.5k output, 4.5m cache read, 298.7k cache write ($3.38)\n```\n\nActual tokens of new work (input + cache write + output, excluding re-reads): **~361k tokens** for Sonnet alone, plus ~127k for Haiku background tasks.\n\n**Post-optimization (v1.2.0)**\n\n```\nTotal cost:               $2.64\nTotal duration (API):     7m 47s\nTotal duration (wall):    9m 12s\nTotal code changes:       119 lines added, 12 lines removed\nUsage by model:\n  claude-haiku-4-5:       482 input, 18 output, 0 cache read, 0 cache write ($0.0006)\n  claude-sonnet-4-6:      395 input, 31.4k output, 5.5m cache read, 137.3k cache write ($2.64)\n```\n\nActual tokens of new work: **~169k tokens** (395 input + 137.3k cache write + 31.4k output). Haiku dropped to ~500 tokens — effectively background housekeeping.\n\n**What the numbers say**\n\n| Metric | Pre-optimization | Post-optimization | Change |\n|---|---|---|---|\n| Session cost | $3.60 | $2.64 | 27% cheaper |\n| Wall time | 17m 10s | 9m 12s | ~46% faster |\n| Actual new tokens (Sonnet) | ~361k | ~169k | ~53% fewer |\n| Cache efficiency (read:write ratio) | ~15:1 | ~40:1 | 2.7× better |\n\nThe 5.5 million cache reads in the post-optimization session might look alarming until you remember that cache reads cost about 10% of input tokens. The 40:1 read-to-write ratio means the context window was stable across turns — almost nothing had to be re-cached. Of the $2.64, cache reads account for roughly $1.65, cache writes ~$0.51, and output tokens ~$0.47. That’s exactly what a well-cached session should look like.\n\nThere’s also a more interesting data point buried in the code-changes column: the pre-optimization session made 1,159 line changes; the post-optimization session made 119. The optimized plugin didn’t just use fewer tokens — it used them more precisely, producing a tighter, more surgical diff for the same task. Less context noise, more signal.\n\n## Takeaways for plugin authors\n\nIf you maintain or are about to publish a [Claude Code plugin](https://www.anthropic.com/news/claude-code-plugins), these are the rules I’d start with:\n\n**Treat front-matter descriptions as the most expensive real estate you own.** They’re injected into every session. One or two sentences: what it does, when to use it.**Progressive disclosure beats monolithic skills.** Keep`SKILL.md`\n\nto the workflow, around 6 KB or less, and move templates, rule catalogs, and edge-case handling to`references/*.md`\n\nfiles the skill reads on demand.**Don’t build what the harness already does.** A routing skill that duplicates Claude’s native description-based routing costs tokens twice and creates a maintenance hazard.**Scope** It’s least-privilege hygiene, it loads fewer schemas where that matters, and the audit itself tends to find permission bugs.`allowed-tools`\n\nto what each component calls.**Make polling cheap.** Any async workflow needs explicit backoff and final-result-only reporting, or the model narrates every poll.\n\nWhat I like about this kind of token optimization work is that none of it required clever engineering. It required measuring where the tokens go and being honest about which ones earn their place. The same context-engineering discipline Anthropic recommends for agents applies one level down, to the tooling we hand them.\n\nAll of these savings are live today: install the latest version of the [Postman plugin from the Claude Plugin Marketplace](https://claude.com/plugins/postman) and your next session picks them up automatically. If you’re building your own plugin, clone the [plugin repo](https://github.com/Postman-Devrel/postman-claude-code-plugin) to see the patterns in place, then check what your own skills cost per trigger. If your largest `SKILL.md`\n\nis over 10 KB, try the `references/`\n\nsplit and compare your `/context`\n\noutput before and after.\n\n## Resources\n\n[Postman plugin on the Claude Plugin Marketplace](https://claude.com/plugins/postman)[Postman plugin for Claude Code on GitHub](https://github.com/Postman-Devrel/postman-claude-code-plugin)[Postman MCP Server on GitHub](https://github.com/postmanlabs/postman-mcp-server)[Announcing the Postman plugin for Claude Code](https://blog.postman.com/announcing-the-postman-plugin-for-claude-code/)[Effective context engineering for AI agents (Anthropic)](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents)[Equipping agents for the real world with Agent Skills (Anthropic)](https://www.anthropic.com/engineering/equipping-agents-for-the-real-world-with-agent-skills)[Agent Skills open standard](https://agentskills.io/)[Claude Code skills documentation](https://code.claude.com/docs/en/skills)[MCP tool schema token overhead discussion](https://github.com/modelcontextprotocol/modelcontextprotocol/issues/2808)", "url": "https://wpnews.pro/news/token-optimization-in-the-postman-plugin-for-claude-code", "canonical_source": "https://blog.postman.com/token-optimization-in-the-postman-plugin-for-claude-code/", "published_at": "2026-06-29 12:05:00+00:00", "updated_at": "2026-06-29 12:33:22.736954+00:00", "lang": "en", "topics": ["large-language-models", "developer-tools", "ai-agents"], "entities": ["Postman", "Claude Code", "Anthropic", "Agent Skills"], "alternates": {"html": "https://wpnews.pro/news/token-optimization-in-the-postman-plugin-for-claude-code", "markdown": "https://wpnews.pro/news/token-optimization-in-the-postman-plugin-for-claude-code.md", "text": "https://wpnews.pro/news/token-optimization-in-the-postman-plugin-for-claude-code.txt", "jsonld": "https://wpnews.pro/news/token-optimization-in-the-postman-plugin-for-claude-code.jsonld"}}