Puppetmaster crushes token costs, increases speed, and context Puppetmaster, a new open-source orchestration layer, routes tasks from agent CLIs like Cursor and Claude Code to the cheapest capable model, reducing token costs by up to 99% while increasing processing speed and context retention. The system stores worker outputs as typed SQLite artifacts, enabling zero-cost follow-up reads, and automatically reroutes tasks when a provider fails or runs out of credits. Turn Cursor, Claude Code, the OpenAI API, or the Codex CLI into an orchestrator that routes every task to the cheapest model that can handle it, runs workers as independent processes, and stores their output as typed SQLite artifacts so follow-ups cost zero tokens. 💸 Reproduce the live A/B in ~$0.01 of spend— OPENAI API KEY=... python -m bench.router live ab . Pinned gpt-5.5 cost$0.0132; Puppetmaster routed the same task to gpt-5.4-nano for$0.00016 same prompt, equivalent answer . The 35.1% figure is a 6-task mixed-workload dry-run where the routercorrectlykept the frontier model on the 2 hard tasks — full method in docs/CLAIMS.md . 🔁 Self-healing — a dead provider doesn't kill the swarm proven live, joba job d82715bebc5d : claude-code worker hit a real$0 Anthropic balance→ classified billing or quota → markedFAILED→auto-rerouted to plan-billed, cursor/gpt-5.5 $0 → the funded adaptercompleted the task.No silent degraded run. pipx install puppetmaster-ai or: pip install puppetmaster-ai puppetmaster setup doctor + models init + MCP installers + agent rules, idempotent That's the whole install. setup runs every step idempotently, skips any tool that isn't present, and prints what it did. Restart Cursor or open a fresh Codex / Claude session and the agent sees 32+ puppetmaster tools plus a rule nudging it to reach for them on multi-file work. To run benchmarks or hack on it, clone instead — see Contributing /professorpalmer/Puppetmaster/blob/main/docs/CONTRIBUTING.md . pipx keeps the CLI in its own isolated environment, which is the recommended way to install a command-line app. New here? Watch the GIF above, run pipx install puppetmaster-ai && puppetmaster setup , then skim What it does what-it-does . | Want to… | Go to | |---|---| | Understand the design & what it fixes | | docs/COMPARISON.md /professorpalmer/Puppetmaster/blob/main/docs/COMPARISON.md docs/SECURITY.md /professorpalmer/Puppetmaster/blob/main/docs/SECURITY.md docs/CLAIMS.md /professorpalmer/Puppetmaster/blob/main/docs/CLAIMS.md · receipts in bench/ docs/FEATURES.md /professorpalmer/Puppetmaster/blob/main/docs/FEATURES.md Quickstart quickstart · docs/DAILY DRIVER.md /professorpalmer/Puppetmaster/blob/main/docs/DAILY DRIVER.md docs/README.md /professorpalmer/Puppetmaster/blob/main/docs/README.md · /professorpalmer/Puppetmaster/blob/main/puppetmaster/README.md puppetmaster/ · /professorpalmer/Puppetmaster/blob/main/bench/README.md bench/ · /professorpalmer/Puppetmaster/blob/main/examples/README.md examples/ · /professorpalmer/Puppetmaster/blob/main/scripts/README.md scripts/ · /professorpalmer/Puppetmaster/blob/main/clients/typescript/README.md clients/typescript/ cursor-extension/ Think Redis/Gunicorn for agentic engineering : Cursor Agent / Claude Code / OpenAI / Codex CLI / shell | v Puppetmaster supervisor ── task-aware model router auto-routes by cost | v independent worker processes ── SQLite typed artifacts, events, memory | v live artifact board ── stitched summary ── 0-token follow-up reads Puppetmaster isn't trying to beat native IDE subagents at every tiny task. It's for the work that gets messy: long repo investigations, conflicting hypotheses, repeated handoffs, flaky memory, and code changes that need evidence, replay, and approval gates. The rationale and failure modes it fixes are in docs/WHY.md /professorpalmer/Puppetmaster/blob/main/docs/WHY.md . How it's different: LangGraph, CrewAI, and the Claude Agent SDK are libraries you write code against to build an agent. Puppetmaster sits one layer up — it orchestrates the agent CLIs you already pay for Cursor, Claude Code, Codex, OpenAI , routes each task to the cheapest sufficient model, keeps the spend inside your subscription, and self-heals when a provider is down. Full side-by-side + "pick X instead if…" in docs/COMPARISON.md /professorpalmer/Puppetmaster/blob/main/docs/COMPARISON.md . The whole story in one command — local + shell adapters, nothing to configure: ./scripts/demo.sh the 60-second tour clean machine, no keys python -m puppetmaster dashboard live, zero-dependency web board for any job It routes a task mix by cost, fans out a 6-role swarm as independent processes, reads the stitched summary, then proves follow-up reads cost $0.00 . Script + GIF source: scripts/ /professorpalmer/Puppetmaster/blob/main/scripts/README.md . Every number is reproducible from a script in bench/ /professorpalmer/Puppetmaster/blob/main/bench . Full detail + caveats: docs/CLAIMS.md /professorpalmer/Puppetmaster/blob/main/docs/CLAIMS.md . Cost is fixed on two axes. New work auto-routes to the cheapest sufficient model 35% cheaper on a fixture; 98.8% cheaper in a live OpenAI A/B . Follow-ups are SQLite reads, not new agent runs 40 queries, $0.00, 0.5 ms each . Workers don't share a transcript. They lease tasks and emit typed artifacts payload + evidence + confidence + sha256 ; the stitcher reads JSON, not stdout. Inspect with puppetmaster artifacts