{"slug": "puppetmaster-crushes-token-costs-increases-speed-and-context", "title": "Puppetmaster crushes token costs, increases speed, and context", "summary": "Puppetmaster, a new open-source orchestration layer, routes tasks from agent CLIs like Cursor and Claude Code to the cheapest capable model, reducing token costs by up to 99% while increasing processing speed and context retention. The system stores worker outputs as typed SQLite artifacts, enabling zero-cost follow-up reads, and automatically reroutes tasks when a provider fails or runs out of credits.", "body_md": "**Turn Cursor, Claude Code, the OpenAI API, or the Codex CLI into an orchestrator that routes every task to the cheapest model that can handle it, runs workers as independent processes, and stores their output as typed SQLite artifacts so follow-ups cost zero tokens.**\n\n💸 Reproduce the live A/B in ~$0.01 of spend—`OPENAI_API_KEY=... python -m bench.router_live_ab`\n\n. Pinned`gpt-5.5`\n\ncost$0.0132; Puppetmaster routed the same task to`gpt-5.4-nano`\n\nfor$0.00016(same prompt, equivalent answer). The 35.1% figure is a 6-task mixed-workload dry-run where the routercorrectlykept the frontier model on the 2 hard tasks — full method in[docs/CLAIMS.md].\n\n🔁 Self-healing — a dead provider doesn't kill the swarm (proven live, joba`job_d82715bebc5d`\n\n):`claude-code`\n\nworker hit a real$0 Anthropic balance→ classified`billing_or_quota`\n\n→ markedFAILED→auto-rerouted to(plan-billed,`cursor/gpt-5.5`\n\n`$0`\n\n) → the funded adaptercompleted the task.No silent degraded run.\n\n```\npipx install puppetmaster-ai     # or: pip install puppetmaster-ai\npuppetmaster setup               # doctor + models init + MCP installers + agent rules, idempotent\n```\n\nThat's the whole install. `setup`\n\nruns every step idempotently, skips any tool that isn't present, and prints what it did. Restart Cursor (or open a fresh Codex / Claude session) and the agent sees 32+ `puppetmaster_*`\n\ntools plus a rule nudging it to reach for them on multi-file work.\n\nTo run benchmarks or hack on it, clone instead — see [Contributing](/professorpalmer/Puppetmaster/blob/main/docs/CONTRIBUTING.md). (`pipx`\n\nkeeps the CLI in its own isolated environment, which is the recommended way to install a command-line app.)\n\n**New here?** Watch the GIF above, run `pipx install puppetmaster-ai && puppetmaster setup`\n\n, then skim [What it does](#what-it-does).\n\n| Want to… | Go to |\n|---|---|\n| Understand the design & what it fixes |\n|\n\n[docs/COMPARISON.md](/professorpalmer/Puppetmaster/blob/main/docs/COMPARISON.md)[docs/SECURITY.md](/professorpalmer/Puppetmaster/blob/main/docs/SECURITY.md)[docs/CLAIMS.md](/professorpalmer/Puppetmaster/blob/main/docs/CLAIMS.md)· receipts in`bench/`\n\n[docs/FEATURES.md](/professorpalmer/Puppetmaster/blob/main/docs/FEATURES.md)[Quickstart](#quickstart)·[docs/DAILY_DRIVER.md](/professorpalmer/Puppetmaster/blob/main/docs/DAILY_DRIVER.md)[docs/README.md](/professorpalmer/Puppetmaster/blob/main/docs/README.md)[·](/professorpalmer/Puppetmaster/blob/main/puppetmaster/README.md)`puppetmaster/`\n\n[·](/professorpalmer/Puppetmaster/blob/main/bench/README.md)`bench/`\n\n[·](/professorpalmer/Puppetmaster/blob/main/examples/README.md)`examples/`\n\n[·](/professorpalmer/Puppetmaster/blob/main/scripts/README.md)`scripts/`\n\n[·](/professorpalmer/Puppetmaster/blob/main/clients/typescript/README.md)`clients/typescript/`\n\n`cursor-extension/`\n\nThink **Redis/Gunicorn for agentic engineering**:\n\n```\nCursor Agent / Claude Code / OpenAI / Codex CLI / shell\n        |\n        v\nPuppetmaster supervisor  ──>  task-aware model router (auto-routes by cost)\n        |\n        v\nindependent worker processes  ──>  SQLite (typed artifacts, events, memory)\n        |\n        v\nlive artifact board  ──>  stitched summary  ──>  0-token follow-up reads\n```\n\nPuppetmaster isn't trying to beat native IDE subagents at every tiny task. It's for the work that gets messy: long repo investigations, conflicting hypotheses, repeated handoffs, flaky memory, and code changes that need evidence, replay, and approval gates. The rationale and failure modes it fixes are in [docs/WHY.md](/professorpalmer/Puppetmaster/blob/main/docs/WHY.md).\n\n**How it's different:** LangGraph, CrewAI, and the Claude Agent SDK are libraries you write code against to *build* an agent. Puppetmaster sits one layer up — it **orchestrates the agent CLIs you already pay for** (Cursor, Claude Code, Codex, OpenAI), routes each task to the cheapest sufficient model, keeps the spend inside your subscription, and self-heals when a provider is down. Full side-by-side + \"pick X instead if…\" in [docs/COMPARISON.md](/professorpalmer/Puppetmaster/blob/main/docs/COMPARISON.md).\n\nThe whole story in one command — local + shell adapters, nothing to configure:\n\n```\n./scripts/demo.sh                  # the 60-second tour (clean machine, no keys)\npython -m puppetmaster dashboard   # live, zero-dependency web board for any job\n```\n\nIt routes a task mix by cost, fans out a 6-role swarm as independent processes, reads the stitched summary, then proves follow-up reads cost **$0.00**. Script + GIF source: [ scripts/](/professorpalmer/Puppetmaster/blob/main/scripts/README.md).\n\nEvery number is reproducible from a script in [ bench/](/professorpalmer/Puppetmaster/blob/main/bench). Full detail + caveats:\n\n[docs/CLAIMS.md](/professorpalmer/Puppetmaster/blob/main/docs/CLAIMS.md).\n\n**Cost is fixed on two axes.** New work auto-routes to the cheapest sufficient model (**35% cheaper** on a fixture;**98.8% cheaper** in a live OpenAI A/B). Follow-ups are SQLite reads, not new agent runs (**40 queries, $0.00, 0.5 ms each**).** Workers don't share a transcript.**They lease tasks and emit** typed artifacts**(payload +`evidence`\n\n+`confidence`\n\n+`sha256`\n\n); the stitcher reads JSON, not stdout. Inspect with`puppetmaster artifacts <job_id>`\n\n.**Graphing is** Workers auto-inject task-relevant graph context before the model call; fall back to grep/read without it. ([CodeGraph](https://github.com/colbymchenry/codegraph)'s win, wired in cleanly.[docs/CODEGRAPH.md](/professorpalmer/Puppetmaster/blob/main/docs/CODEGRAPH.md))**A dead provider doesn't kill the swarm (v0.9.0+).** Billing/quota/auth/missing-CLI failures are marked`FAILED`\n\nand**auto-rerouted to the next funded adapter**, preferring plan-billed models. Validated live; surfaced loudly in the summary's Alerts section.\n\nAfter install, try one of these inside Cursor Agent or Codex:\n\n```\nUse Puppetmaster to run doctor in this repo and summarize what is missing.\nUse Puppetmaster to start a cursor swarm for this repo and return the job id immediately.\nProblem: users get logged out after refresh and token-refresh tests are flaky.\nConstraints: keep the patch focused, preserve public API behavior, run relevant tests.\nDo review/plan first. Poll status/logs by job id. Do not edit until you summarize findings and ask for approval.\n```\n\nOr from the shell:\n\n```\npuppetmaster doctor\npuppetmaster route \"Security audit every endpoint\" --role audit   # dry-run routing decision\npuppetmaster cursor \"Review this repo for release blockers\" --review --dry-run\npuppetmaster claude \"Implement the approved change and run focused tests\" --permission-mode acceptEdits\npuppetmaster show $(puppetmaster last)\n```\n\nMore recipes in [docs/DAILY_DRIVER.md](/professorpalmer/Puppetmaster/blob/main/docs/DAILY_DRIVER.md).\n\n**Daily-driver beta.** Real runtime contract, automated tests, SQLite default backend, fail-closed jobs, live Cursor Agent MCP, installable Cursor extension, validated full-edit adapters. Credible for supervised local engineering; not yet a hosted multi-user service. Full feature matrix: [docs/FEATURES.md](/professorpalmer/Puppetmaster/blob/main/docs/FEATURES.md).\n\n**Pip name:** PyPI lists this as [ puppetmaster-ai](https://pypi.org/project/puppetmaster-ai/) because\n\n[PEP-503 normalization](https://peps.python.org/pep-0503/#normalized-names)collides\n\n`puppetmaster`\n\nwith an [abandoned 2019](https://pypi.org/project/puppet-master/). The import name, CLI, repo, and brand stay\n\n`puppet-master`\n\n`puppetmaster`\n\n. ([tracking](/professorpalmer/Puppetmaster/blob/main/docs/PYPI_NAME_REQUEST.md))\n\nMIT", "url": "https://wpnews.pro/news/puppetmaster-crushes-token-costs-increases-speed-and-context", "canonical_source": "https://github.com/professorpalmer/Puppetmaster", "published_at": "2026-05-30 17:28:48+00:00", "updated_at": "2026-05-30 17:46:18.408623+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-infrastructure", "large-language-models", "ai-products"], "entities": ["Puppetmaster", "Cursor", "Claude Code", "OpenAI", "Codex CLI", "Anthropic", "GPT-5.5", "GPT-5.4-nano"], "alternates": {"html": "https://wpnews.pro/news/puppetmaster-crushes-token-costs-increases-speed-and-context", "markdown": "https://wpnews.pro/news/puppetmaster-crushes-token-costs-increases-speed-and-context.md", "text": "https://wpnews.pro/news/puppetmaster-crushes-token-costs-increases-speed-and-context.txt", "jsonld": "https://wpnews.pro/news/puppetmaster-crushes-token-costs-increases-speed-and-context.jsonld"}}