{"slug": "three-checks-that-separate-an-agent-demo-from-a-production-agent", "title": "Three checks that separate an agent demo from a production agent", "summary": "An open-source Agentic Product Standard v2.0 has been released, turning three critical production safeguards into enforceable code rather than advisory principles. The standard addresses the \"lethal trifecta\" of data exfiltration risk by requiring a CI gate that breaks at least one leg of the access-to-private-data, exposure-to-untrusted-content, and external-communication triad. It also introduces MCP supply chain security through tool-definition hashing, hard per-run cost ceilings enforced in code, and a binary maturity scorecard that replaces subjective readiness assessments with a pass/fail checklist.", "body_md": "Shipping an agent demo takes an afternoon. Shipping one that survives a quarter in production is a different job — and the gap is almost never the model. It's three boring things that are usually missing entirely.\n\nI maintain an open, MIT-licensed Agentic Product Standard, and v2.0 was mostly about turning those three things from advice into code you can run. Here they are, with the actual code.\n\nReal safety comes from architecture. The check I reach for first is Simon Willison's lethal trifecta: an agent becomes an exfiltration tool the moment it has all three of —\n\naccess to private data,\n\nexposure to untrusted content, and\n\nthe ability to communicate externally.\n\nAny one is fine. All three together is a data-exfiltration channel waiting for a payload hidden in a retrieved document. The fix is never \"better filter\" — it's break a leg: gate egress, quarantine untrusted input, or scope the data.\n\nHere's the gate, as a CI step. You declare what your agent can touch; the build fails if the trifecta is unmitigated:\n\nLEGS = (\"private_data\", \"untrusted_content\", \"external_comms\")\n\ndef evaluate(spec):\n\npresent = [l for l in LEGS if spec.get(l) is True]\n\nif len(present) < 3:\n\nreturn 0, f\"OK: only {len(present)}/3 legs present\"\n\nbroken = {m[\"leg\"] for m in spec.get(\"mitigations\", []) if m.get(\"control\")}\n\nif broken & set(LEGS):\n\nreturn 0, f\"OK: trifecta present but broken at {', '.join(broken)}\"\n\nreturn 1, \"FAIL: lethal trifecta, no leg broken\"\n\nThe second structural control is MCP supply chain. Community MCP servers are untrusted code, and a server can hand you a benign tool description at approval time, then mutate it later (a \"rug pull,\" or tool-definition poisoning). So: pin tool definitions by hash and alert on change. A few lines of bash in CI catches it:\n\njq -cS '(.tools)|sort_by(.name)[]|{name,description,inputSchema}' tools.json \\\n\n| while read -r t; do printf '%s %s\\n' \"$(printf '%s' \"$t\"|sha256sum|cut -d' ' -f1)\" \\\n\n\"$(printf '%s' \"$t\"|jq -r .name)\"; done > current.lock\n\ndiff -u tools.lock current.lock || { echo \"::error::MCP tool def changed — possible rug pull\"; exit 1; }\n\nv2.0 makes cost a hard control:\n\na per-run token/cost ceiling enforced in code (a breaker, checked before each model call), not a bill you read later;\n\nprompt/KV caching on stable prefixes (system prompt, tool schemas);\n\nmodel routing — small model for classification, flagship for reasoning;\n\nand the economics rule people learn the expensive way: only pay the 15× for multi-agent when the task value justifies it. If one agent clears the bar, the orchestra is waste.\n\nThe point is that \"cost\" stops being a surprise the moment it's a number your code enforces and your traces record per task.\n\nSo the standard adds a maintenance doctrine (adapted, with credit, from Daniel Miessler's PAI): audit every instruction on a cadence with one question —\n\nWould a smarter model make this rule unnecessary?\n\nIf yes, it's scaffolding, not architecture — cut it. Tag rules anti-fragile (eval sets, verification harnesses, tool contracts, real failure gotchas → keep) vs fragile (chain-of-thought orchestrators, output parsers, retry cascades → cut or re-test on the next model upgrade). A standard that only grows is one that rots.\n\nMaking it checkable\n\nPrinciples are easy to agree with and impossible to audit. So v2.0 ships a self-assessment scorecard — a binary Yes/No maturity check (M0 Prototype → M1 Shippable → M2 Production → M3 Autonomous-ready), mapped to an autonomy ladder. Your level is the highest band where every gate passes; the first \"No\" is your next task. \"Is it production-ready?\" becomes a checklist instead of a vibe.\n\nAlongside it: the red-team kit above plus a CI workflow template that blocks merges when the eval pass-rate slips, and a 2026 refresh (MCP + OAuth 2.1, A2A at the Linux Foundation, OpenTelemetry GenAI tracing, trajectory + pass^k eval metrics).\n\nThe one rule under all of it\n\nThe model is the variable. The harness is the constant. Invest proportionally.\n\n…with the v2.0 twist: the harness isn't something you accumulate forever. You curate it — growing the parts that compound, deleting the parts that only propped up a weaker model.\n\nIt's MIT and vendor-neutral (deliberately not a framework), with an optional Claude Code skill set. I'd rather be told what's wrong with it than collect stars — the deferred list is where I'm least sure.\n\nRepo: [https://github.com/Moai-Team-LLC/agentic-product-standard](https://github.com/Moai-Team-LLC/agentic-product-standard)", "url": "https://wpnews.pro/news/three-checks-that-separate-an-agent-demo-from-a-production-agent", "canonical_source": "https://dev.to/alex_duch/three-checks-that-separate-an-agent-demo-from-a-production-agent-5a8b", "published_at": "2026-06-06 09:05:58+00:00", "updated_at": "2026-06-06 09:11:17.986540+00:00", "lang": "en", "topics": ["ai-agents", "ai-safety", "ai-products", "ai-tools", "ai-infrastructure"], "entities": ["Simon Willison", "Agentic Product Standard"], "alternates": {"html": "https://wpnews.pro/news/three-checks-that-separate-an-agent-demo-from-a-production-agent", "markdown": "https://wpnews.pro/news/three-checks-that-separate-an-agent-demo-from-a-production-agent.md", "text": "https://wpnews.pro/news/three-checks-that-separate-an-agent-demo-from-a-production-agent.txt", "jsonld": "https://wpnews.pro/news/three-checks-that-separate-an-agent-demo-from-a-production-agent.jsonld"}}