{"slug": "runcap-i-built-a-local-cost-cap-for-coding-agents", "title": "Runcap, I built a local cost cap for coding agents", "summary": "Runcap, a new open-source developer tool, estimates the cost of coding agent runs before they begin and enforces a hard spending ceiling that physically stops the run when the limit is reached. The tool, which operates entirely locally without sending code or tokens to a server, provides a cost range and a circuit breaker to prevent surprise bills from multi-agent runs that can burn 15x more tokens than a single chat. Unlike observability tools that show costs after they are incurred, Runcap estimates the bill upfront and hands users a rescue prompt when the agent gets stuck.", "body_md": "**Know what your coding agent will cost before you build it, and set a hard ceiling so it never surprises you.**\n\nRuncap estimates the cost of an agent run as a range, enforces a hard spend ceiling that physically stops the run, and when the agent gets stuck it hands you the exact rescue prompt. Free, MIT, 100% local. Your code and tokens never touch a server.\n\nEvery other tool here is a rear-view mirror - it shows you the bill\n\nafteryou paid it. Runcap estimates the billbeforeyou start and caps it. It is a circuit breaker, not a dashboard.\n\nMulti-agent coding runs burn roughly **15x more tokens** than a single chat ([Anthropic engineering](https://www.anthropic.com/engineering/built-multi-agent-research-system)). Agents loop on the same error, rewrite plans, and hand you a confident summary while the task is not actually done. You find out what it cost when the invoice - or the subscription limit - arrives.\n\nObservability tools (Langfuse, Helicone, LangSmith, AgentOps) measure the past. Gateways (LiteLLM, Portkey, OpenRouter) route the present. None of them stop the spend *before* it happens. Runcap does the one thing the rear-view mirror can't:\n\n```\nestimate before build  →  cap during run  →  compress every call  →  rescue when stuck\n```\n\nRuncap does **not** promise an exact cost oracle. Agent trajectories are stochastic - nobody, including the model labs, can predict the exact token count of a run. So Runcap gives you a **range plus a hard cap**:\n\n\"This build is roughly $3-7. Cap it at $10.\" - then it kills the run the second it hits the ceiling.\n\nThe range is the headline. The hard cap is the product.\n\nRuncap is a developer tool. It works by running a local gateway that your agent's API calls pass through, so it can price and cap them before they reach the paid provider. That means you need three things already in place:\n\n**Your own provider API key**(OpenAI or Anthropic). Runcap does not sell or supply model access.** Your own agent**- Claude Code, Codex, or any script that calls the OpenAI/Anthropic API.** Comfort running a CLI**and a local process on your machine.\n\nIf you have those, Runcap caps your spend in one command. If you are looking for a no-account web app that runs the AI for you, this is not that - it is a circuit breaker for a setup you already own.\n\nNo API key required.\n\n```\ngit clone https://github.com/kirder24-code/ai-agent-manager.git\ncd ai-agent-manager\nnpm run setup\nnpm run demo\n```\n\n**1. Catch a too-broad request before it spends anything:**\n\n``` bash\n$ runcap preflight -- claude \"build the full mobile app with auth payments and production deploy\"\n\nPreflight: claude build the full mobile app with auth payments and production deploy\nScope risk: high\nFuel: 24% (medium confidence)\nRecommendation: Do not launch as one broad mission. Split into one vertical slice with a verification command.\n```\n\n**2. Wrap a run - and get a rescue prompt the moment it gets stuck:**\n\n``` bash\n$ runcap run --label demo -- npm run build\n\nError [ERR_MODULE_NOT_FOUND]: Cannot find package '@/components' ...\n\nRuncap mission: 20260601T221531-demo-ff42c0a\nStatus: stuck (medium confidence)\nExit code: 1\nChanged files: 0\nParsed errors: 1\nPrimary recommendation: Resolve missing import before continuing feature work\n```\n\nThe rescue report hands back a copyable prompt:\n\n```\nDo not continue broad implementation. Diagnose this missing module first:\nCannot find package '@/components'. Check package.json, tsconfig paths, and\nthe latest git diff. Make the smallest change that resolves the import,\nthen run the failing command again.\nnpm install -g runcap     # exposes `runcap` (and `aim` as a legacy alias)\n```\n\nOr run from source with `node ./bin/runcap.mjs <command>`\n\n.\n\n```\nruncap plan --fuel 24 -- \"build a small auth feature and verify it\"   # range + recommended cap, before you spend\nruncap preflight -- claude \"build a full SaaS app\"                     # is this prompt too broad?\nruncap run --label fix -- claude \"fix one failing check. stop if blocked.\"  # wrap any agent/command\nruncap report                                                          # human-readable rescue report\nruncap export                                                          # evidence JSON with truth labels\nruncap dashboard                                                       # local cockpit at :8791\nruncap gateway                                                         # cost-tracking proxy with hard budget cap\nruncap fuel set 24                                                     # calibrate a %-only subscription\n```\n\nPoint any OpenAI- or Anthropic-compatible tool at the local gateway. It records real token usage, prices it from a sourced table, and **blocks calls the moment your daily ceiling is hit**.\n\n```\n# OpenAI-compatible agents\nOPENAI_API_KEY=sk-... AIM_DAILY_BUDGET_USD=5 runcap gateway\n#   then: OPENAI_BASE_URL=http://127.0.0.1:8792/v1\n\n# Anthropic-native (Claude Code, /v1/messages)\nANTHROPIC_API_KEY=sk-ant-... AIM_DAILY_BUDGET_USD=5 runcap gateway\n#   then: ANTHROPIC_BASE_URL=http://127.0.0.1:8792/v1\n```\n\nWhen spend crosses the ceiling, the next call returns `429 budget_guard`\n\ninstead of money leaving your account. Try it with no key: `runcap gateway --mock`\n\n.\n\nEvery request that passes through the gateway is compressed before it's forwarded: embedded JSON is re-serialized compactly, long log/stack-trace dumps are collapsed to head + tail, and trailing whitespace is squeezed. This is **lossless by construction** - your prose instructions and code semantics are never altered, only machine \"garbage\" is trimmed. It's pure Node with **zero ML or native dependencies**, so it installs everywhere without the build pain heavier compressors have.\n\nThe dashboard shows the result as one number: **\"You saved $X · N tokens compressed · would have spent $Y.\"** Disable it with `AIM_COMPRESS=off`\n\nif you ever want raw passthrough.\n\nCosts are calculated from a sourced multi-provider table - Anthropic (Opus / Sonnet / Haiku) and OpenAI (GPT-5 family + legacy GPT-4), with cache-read and batch discounts handled - labeled with source and verification date. When a model is unknown, Runcap says `unknown_price`\n\nrather than guessing.\n\nRuncap is built not to fake certainty. Every important output carries a truth label:\n\n`observed`\n\n- git diff, exit code, file changes, terminal output;`calculated`\n\n- parsed errors, diff hashes, stuck score, cost from the sourced price table;`provider_usage`\n\n- token usage returned by the upstream provider;`manual_calibration`\n\n- subscription % you entered before/after a run;`unknown`\n\n- Runcap cannot honestly know.\n\nIf it cannot prove something, it says so.\n\n| Tier | Price | What you get |\n|---|---|---|\nOSS (MIT, local) |\n$0 forever | All local runs, cost estimation, hard cap, run wrapping, stuck detection, rescue prompts, local dashboard. Never crippleware. |\nFounding Pro (limited) |\n$49 once |\nLifetime Pro at the founder price - pay once, keep Pro forever, before it moves to $19/mo. |\nPro |\n$19/mo | Cloud sync across machines, hosted dashboard, estimate-vs-actual trends, shareable reports, alerts on cap breach |\nTeam |\n$49/seat/mo | Shared budget pools, org-wide ceilings, per-project rollups, role-based caps |\n\nThe local core is free forever. Only persistence, collaboration, and aggregation are paid - the things that only matter once data leaves your laptop.\n\nA working local tool, not a hosted SaaS. Ready for: wrapping real Codex / Claude / Cursor sessions, catching stuck agents, and proving rescue prompts save time. Not yet: a hosted cloud platform or a universal observability standard. It is not trying to replace Langfuse or LiteLLM - it does the thing they don't.\n\nThe thesis: **AI agents need managers.**", "url": "https://wpnews.pro/news/runcap-i-built-a-local-cost-cap-for-coding-agents", "canonical_source": "https://github.com/kirder24-code/ai-agent-manager", "published_at": "2026-06-05 17:27:11+00:00", "updated_at": "2026-06-05 17:50:20.148951+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "ai-startups", "ai-products"], "entities": ["Runcap", "Anthropic", "Langfuse", "Helicone", "LangSmith", "AgentOps", "LiteLLM", "Portkey"], "alternates": {"html": "https://wpnews.pro/news/runcap-i-built-a-local-cost-cap-for-coding-agents", "markdown": "https://wpnews.pro/news/runcap-i-built-a-local-cost-cap-for-coding-agents.md", "text": "https://wpnews.pro/news/runcap-i-built-a-local-cost-cap-for-coding-agents.txt", "jsonld": "https://wpnews.pro/news/runcap-i-built-a-local-cost-cap-for-coding-agents.jsonld"}}