{"slug": "github-slashes-agent-workflow-token-spend-up-to-62-with-daily-audits-and-mcp", "title": "GitHub Slashes Agent Workflow Token Spend up to 62% with Daily Audits and MCP Pruning", "summary": "GitHub reduced token usage in its agentic workflows by up to 62% after pruning unused Model Context Protocol tools, replacing MCP calls with GitHub CLI commands, and deploying daily audit and optimization agents. The company's Auto-Triage Issues workflow showed a sustained 62% reduction in effective tokens over 109 post-fix runs, while Security Guard dropped 43% and Smoke Claude fell 59% across a dozen production workflows. The audit-and-optimize loop, which combines proxy-level observability with agents that file issues, ships in GitHub's gh-aw CLI and targets the hidden costs of scheduled LLM agent jobs in continuous integration.", "body_md": "GitHub has [published results](https://github.blog/ai-and-ml/github-copilot/improving-token-efficiency-in-github-agentic-workflows/) from work to cut token usage in the agentic workflows it runs in its own repositories. The company recorded reductions of up to 62% after pruning unused [Model Context Protocol](https://modelcontextprotocol.io) (MCP) tools, replacing MCP calls with GitHub CLI invocations, and adding daily audit and optimisation agents.\n\nThe work matters for any team running large language model (LLM) agents inside continuous integration (CI), where scheduled jobs accumulate cost out of view. GitHub routes every agent call through an API proxy and now writes a token-usage.jsonl artefact for each run that captures input, output and cache tokens in one normalised format across Claude CLI, Copilot CLI and Codex CLI.\n\nTo compare across model tiers, the team uses an Effective Tokens (ET) metric that weights output tokens by 4× and cache reads by 0.1×, then applies a model multiplier (Haiku at 0.25×, Sonnet at 1.0×, Opus at 5.0×). A 10% drop in ET maps to a 10% cost reduction regardless of the model in use.\n\nTwo agentic workflows drive the optimisation loop. A Daily Token Usage Auditor aggregates consumption by workflow, flags anomalous runs and surfaces the most expensive jobs. When the auditor highlights a workflow, a Daily Token Optimiser reads the source and recent logs, opens a GitHub issue, and proposes specific fixes. Both agents themselves appear in the same daily reports.\n\nThe most common inefficiency the optimiser finds is unused MCP tools. Because LLM APIs are stateless, agent runtimes include tool schemas with every request, so a GitHub MCP server with 40 tools can add 10 to 15 KB of schema per turn. Removing unused entries cuts per-call context by 8 to 12 KB in GitHub's smoke-test workflows. The team also replaced MCP calls for fetching pull request diffs and file contents with [gh CLI](https://cli.github.com/manual/gh_pr_diff) commands, either pre-downloaded into workspace files before the agent starts or proxied at runtime through a transparent HTTP proxy that keeps authentication tokens away from the agent.\n\nAcross a dozen production workflows, Auto-Triage Issues shows a sustained 62% ET reduction over 109 post-fix runs, Security Guard 43%, and Smoke Claude 59%. Daily Community Attribution improved 37%. One workflow, Contribution Check, recorded a 5% ET increase that GitHub attributes to a workload shift toward larger pull requests rather than a regression.\n\nThe team also notes the limits of MCP pruning. Daily Community Attribution carried eight unused GitHub MCP tools and made zero calls to them across an entire run, yet removing them did not reduce ET. \"Tool manifests were a small fraction of this workflow's overall context,\" GitHub wrote.\n\n[Anthropic](https://platform.claude.com/docs/en/build-with-claude/prompt-caching) and [OpenAI](https://platform.openai.com/docs/guides/prompt-caching) both offer prompt caching, and [LangChain](https://python.langchain.com/docs/how_to/chat_token_usage_tracking/) offers callback-based token tracking for agent runs. GitHub's contribution is the audit-and-optimise loop, which combines proxy-level observability with optimiser agents that file issues. The Auditor and Optimiser ship in the [gh-aw CLI](https://github.com/github/gh-aw) today.\n\n\"The cheapest LLM call is the one you don't make,\" GitHub wrote, framing the next step as portfolio-level analysis that targets duplicated reads and shared intermediate artefacts across the fleet of workflows in a repository.", "url": "https://wpnews.pro/news/github-slashes-agent-workflow-token-spend-up-to-62-with-daily-audits-and-mcp", "canonical_source": "https://www.infoq.com/news/2026/05/github-agentic-token-savings/?utm_campaign=infoq_content&utm_source=infoq&utm_medium=feed&utm_term=global", "published_at": "2026-05-29 08:30:00+00:00", "updated_at": "2026-05-29 09:13:20.360837+00:00", "lang": "en", "topics": ["ai-agents", "large-language-models", "ai-tools", "ai-infrastructure", "mlops"], "entities": ["GitHub", "Model Context Protocol", "Claude CLI", "Copilot CLI", "Codex CLI", "Haiku", "Sonnet", "Opus"], "alternates": {"html": "https://wpnews.pro/news/github-slashes-agent-workflow-token-spend-up-to-62-with-daily-audits-and-mcp", "markdown": "https://wpnews.pro/news/github-slashes-agent-workflow-token-spend-up-to-62-with-daily-audits-and-mcp.md", "text": "https://wpnews.pro/news/github-slashes-agent-workflow-token-spend-up-to-62-with-daily-audits-and-mcp.txt", "jsonld": "https://wpnews.pro/news/github-slashes-agent-workflow-token-spend-up-to-62-with-daily-audits-and-mcp.jsonld"}}