GitHub Slashes Agent Workflow Token Spend up to 62% with Daily Audits and MCP Pruning

wpnews.pro

cd /news/ai-agents/github-slashes-agent-workflow-token-… · home › topics › ai-agents › article

[ARTICLE · art-17404] src=infoq.com ↗ pub=2026-05-29T08:30Z topic=ai-agents verified=true sentiment=↑ positive

GitHub Slashes Agent Workflow Token Spend up to 62% with Daily Audits and MCP Pruning

GitHub reduced token usage in its agentic workflows by up to 62% after pruning unused Model Context Protocol tools, replacing MCP calls with GitHub CLI commands, and deploying daily audit and optimization agents. The company's Auto-Triage Issues workflow showed a sustained 62% reduction in effective tokens over 109 post-fix runs, while Security Guard dropped 43% and Smoke Claude fell 59% across a dozen production workflows. The audit-and-optimize loop, which combines proxy-level observability with agents that file issues, ships in GitHub's gh-aw CLI and targets the hidden costs of scheduled LLM agent jobs in continuous integration.

read3 min views23 publishedMay 29, 2026

GitHub has published results from work to cut token usage in the agentic workflows it runs in its own repositories. The company recorded reductions of up to 62% after pruning unused Model Context Protocol (MCP) tools, replacing MCP calls with GitHub CLI invocations, and adding daily audit and optimisation agents.

The work matters for any team running large language model (LLM) agents inside continuous integration (CI), where scheduled jobs accumulate cost out of view. GitHub routes every agent call through an API proxy and now writes a token-usage.jsonl artefact for each run that captures input, output and cache tokens in one normalised format across Claude CLI, Copilot CLI and Codex CLI.

To compare across model tiers, the team uses an Effective Tokens (ET) metric that weights output tokens by 4× and cache reads by 0.1×, then applies a model multiplier (Haiku at 0.25×, Sonnet at 1.0×, Opus at 5.0×). A 10% drop in ET maps to a 10% cost reduction regardless of the model in use.

Two agentic workflows drive the optimisation loop. A Daily Token Usage Auditor aggregates consumption by workflow, flags anomalous runs and surfaces the most expensive jobs. When the auditor highlights a workflow, a Daily Token Optimiser reads the source and recent logs, opens a GitHub issue, and proposes specific fixes. Both agents themselves appear in the same daily reports.

The most common inefficiency the optimiser finds is unused MCP tools. Because LLM APIs are stateless, agent runtimes include tool schemas with every request, so a GitHub MCP server with 40 tools can add 10 to 15 KB of schema per turn. Removing unused entries cuts per-call context by 8 to 12 KB in GitHub's smoke-test workflows. The team also replaced MCP calls for fetching pull request diffs and file contents with gh CLI commands, either pre-downloaded into workspace files before the agent starts or proxied at runtime through a transparent HTTP proxy that keeps authentication tokens away from the agent.

Across a dozen production workflows, Auto-Triage Issues shows a sustained 62% ET reduction over 109 post-fix runs, Security Guard 43%, and Smoke Claude 59%. Daily Community Attribution improved 37%. One workflow, Contribution Check, recorded a 5% ET increase that GitHub attributes to a workload shift toward larger pull requests rather than a regression.

The team also notes the limits of MCP pruning. Daily Community Attribution carried eight unused GitHub MCP tools and made zero calls to them across an entire run, yet removing them did not reduce ET. "Tool manifests were a small fraction of this workflow's overall context," GitHub wrote.

Anthropic and OpenAI both offer prompt caching, and LangChain offers callback-based token tracking for agent runs. GitHub's contribution is the audit-and-optimise loop, which combines proxy-level observability with optimiser agents that file issues. The Auditor and Optimiser ship in the gh-aw CLI today.

"The cheapest LLM call is the one you don't make," GitHub wrote, framing the next step as portfolio-level analysis that targets duplicated reads and shared intermediate artefacts across the fleet of workflows in a repository.

source & further reading

infoq.com — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/github-slashes-agent-wor…

Read original on infoq.com → www.infoq.com/news/2026/05/github-agentic-token-…

mentioned entities

GitHub

Model Context Protocol

Claude CLI

Copilot CLI

Codex CLI

Haiku

Sonnet

Opus

metadata

sluggithub-slashes-agent-workflow-token-spend-up-to-62-with-daily-audits-and-mcp

topic#ai-agents

secondary4 topics

sentimentpositive

canonicalinfoq.com

navigation

← prevShow HN: Sixbpm – a free thing t…

next →Show HN: EV-QA-Framework – ML-po…

── more in #ai-agents 4 stories · sorted by recency

blog.devgenius.io · 14 Jul · #ai-agents

How I Built a Local Apple Notes MCP Server to Give Claude Desktop Real Note Automation

github.com · 14 Jul · #ai-agents

European Parliament MCP Server – Political Intelligence for AI Agents

github.com · 12 Jul · #ai-agents

Show HN: Block dangerous Git and shell commands from being executed by agents

dev.to · 10 Jul · #ai-agents

Aider vs OpenCode vs Claude Code: Which CLI Coding Agent Wins in 2026?

── more on @github 3 stories trending now

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required