{"slug": "an-open-standard-for-production-agents-with-runnable-security-checks", "title": "An open standard for production agents – with runnable security checks", "summary": "Anthropic, OpenAI, Cognition, Sierra, LangChain, and leading practitioners have published an open standard for building production-grade agentic products, codified as a canonical specification with runnable security checks. The standard establishes six principles — including determinism by default, architecture over framework, and security as a structural property — along with a Claude Code skill set that operationalizes the rules directly in the editor. The project aims to close the gap between agent demos and production-ready systems by providing a field-tested architecture, harness, and eval discipline rather than relying on model improvements alone.", "body_md": "### A canonical standard for building production-grade agentic products — plus a Claude Code skill set that operationalizes it.\n\n*Distilled from the production practices of Anthropic, OpenAI, Cognition, Sierra, LangChain, and leading practitioners — 2024–2026.*\n\n** Product Standard →** ·\n\n**·**\n\n[Agent Standard →](/Moai-Team-LLC/agentic-product-standard/blob/main/AGENT_STANDARD.md)**·**\n\n[Install the Skills →](#-install-the-skills)\n\n[Decision Checklist →](#-the-10-question-checklist)\n\nAn agentic product is not \"a product with AI.\"It is a product where part of the process is dynamically directed by an LLM within adeterministic architecturewithexplicit trust boundaries.\n\nMost teams ship agent demos. Few ship agents that survive contact with production. The difference is almost never the model — it's the **architecture, the harness, and the eval discipline** around it. This repo is the field-tested standard for that work, plus a set of [Claude Code skills](/Moai-Team-LLC/agentic-product-standard/blob/main/skills) that put it into your editor.\n\n[Why this exists](#why-this-exists)[The six principles](#the-six-principles)[What's in this repo](#whats-in-this-repo)[Install the skills](#-install-the-skills)[The reference implementation](#-the-reference-implementation)[The Autonomy Ladder](#the-autonomy-ladder)[The five composition patterns](#the-five-composition-patterns)[The 8-layer harness](#the-8-layer-harness)[The 10-question checklist](#-the-10-question-checklist)[Score yourself](#-score-yourself)[Production readiness — Definition of Done](#production-readiness--definition-of-done)[Anti-patterns](#anti-patterns)[Reading list](#reading-list)[Contributing](#contributing)[License](#license)\n\nSix principles converged *independently* across the production practices of the labs and the leading practitioners. They are the spine of every decision in this standard:\n\n| # | Principle | What it means |\n|---|---|---|\n| 1 | Determinism by default, agency by necessity |\nEvery degree of autonomy must be earned, not granted upfront. |\n| 2 | Architecture beats framework |\nPatterns outlive libraries. |\n| 3 | Harness > model |\n98% of reliability lives in the code around the LLM. |\n| 4 | Context engineering is the core discipline |\nWhat enters the context window determines everything. |\n| 5 | Eval-driven development is non-negotiable |\nNo measurement → no improvement. No trace review → no understanding. |\n| 6 | Security is a structural property, not a guardrail |\nSafety comes from architecture — identity, least privilege, isolation, pinned tools — not filters bolted onto the edges. |\n\nThe single most important rule:Architecture is what remains when the model improves. The model is the variable, the harness is the constant. Invest proportionally.\n\n```\nagentic-product-standard/\n├── STANDARD.md                          ← the canonical standard (product level)\n├── AGENT_STANDARD.md                    ← the single-agent operational standard (mirrored in agent-builder)\n├── SCORECARD.md                         ← M0–M3 self-assessment, mapped to the Autonomy Ladder\n├── CONTEXT.md                           ← shared vocabulary every skill speaks\n├── setup.sh                             ← quick setup: skills + (optional) AgenticMind, one run\n├── templates/security/                  ← red-team kit: lethal-trifecta gate, injection suite, MCP pin\n├── templates/ci/eval-gate.yml           ← CI workflow that blocks merges on eval regression\n├── examples/agenticmind-case-study.md   ← reference implementation, audited against the canon\n├── docs/adr/                            ← architecture decision records (why the repo is shaped this way)\n└── skills/                              ← Claude Code skill set (operationalizes the standard)\n    ├── agent-builder/                    ← single-agent track (bundles AGENT_STANDARD.md + templates/)\n    └── agentic-product-architect/        ← multi-agent track: master router + sub-skills\n        ├── SKILL.md                      ← master: router + philosophy\n        ├── architecture-design/          ← autonomy ladder, 5 patterns, single vs multi\n        ├── context-engineering/          ← write/select/compress/isolate, the 40% rule\n        ├── harness-engineering/          ← the 8 layers around the LLM loop\n        ├── tool-design-mcp/              ← MCP-first, <20 tools, RAG-MCP, sandboxing\n        ├── memory-architecture/          ← Mem0 / Zep / Letta / LangMem / files\n        ├── tenant-isolation/             ← multi-tenant: pooled/silo, leakage paths, leakage eval\n        ├── durable-execution/            ← Temporal Workflow + Activity pattern\n        ├── eval-driven-dev/              ← Husain/Shankar pyramid + judge calibration\n        ├── framework-selection/          ← LangGraph / Claude SDK / OpenAI SDK / others\n        ├── production-readiness/         ← 12-point Definition of Done audit\n        └── antipatterns-review/          ← code review through 12 known failure modes\n```\n\nTwo standards, one practice:\n\nis the`STANDARD.md`\n\n*product-level*reference — read it once, return to it often.is the`AGENT_STANDARD.md`\n\n*single-agent*operational standard — contract, schemas, permission tiers, durable state, evals (also bundled into the`agent-builder`\n\nskill so it ships self-contained).is the`skills/`\n\n*practice*— two Claude Code skills (`agent-builder`\n\nfor one agent,`agentic-product-architect`\n\nfor multi-agent products) that auto-load the right guidance while you design, build, and review.\n\nThe skill set works with [Claude Code](https://claude.com/claude-code). Two tracks share the same sub-skills: ** agent-builder** for building one production-grade agent (it bundles\n\n`AGENT_STANDARD.md`\n\n+ copy-paste `templates/`\n\n), and **— a master skill that routes to ten specialized sub-skills for multi-agent products. Each is independently triggerable.**\n\n`agentic-product-architect`\n\nIf you have [ skills](https://github.com/mattpocock/skills) (the community skill installer), pull the skill straight from this repo — it scans\n\n`skills/`\n\n, lets you pick `agent-builder`\n\nand/or `agentic-product-architect`\n\n, and installs them into the agents you choose:\n\n```\nnpx skills@latest add Moai-Team-LLC/agentic-product-standard\n```\n\nThis installs the\n\nskills only. To also stand up the runnable memory layer ([AgenticMind]), run the`setup.sh --with-agenticmind`\n\nflow below — or, after adding the skills, point your agent's MCP client at a self-hosted AgenticMind (see its[Quickstart]).\n\n`setup.sh`\n\ninstalls the skills and, in the same run, can stand up ** AgenticMind** — the reference implementation — as a runnable\n\n**knowledge & memory layer** your agent calls over MCP. Design guidance\n\n*and*a working substrate, end to end:\n\n```\ngit clone https://github.com/Moai-Team-LLC/agentic-product-standard.git\ncd agentic-product-standard\n./setup.sh --with-agenticmind        # skills → clone AgenticMind → its setup.sh (deps + Postgres + migrations)\n```\n\nRun bare (`./setup.sh`\n\n) and it installs the skills, then *asks* whether to set AgenticMind up next. AgenticMind's leg needs [Bun](https://bun.sh) ≥1.3 + Docker; the skills themselves need neither.\n\n```\n./setup.sh                  # install skills here, then prompt about AgenticMind\n./setup.sh --user           # install skills into ~/.claude/skills (every project)\n./setup.sh --skills-only    # skills only, no prompt\n```\n\n**User-level (available in every project):**\n\n```\ncp -R agentic-product-standard/skills/* ~/.claude/skills/   # both tracks; they share sub-skills\n```\n\n**Project-level (scoped to one repo):**\n\n```\nmkdir -p .claude/skills\ncp -R /path/to/agentic-product-standard/skills/* .claude/skills/   # both tracks\n```\n\nClaude Code discovers skills via each `SKILL.md`\n\nand its YAML frontmatter. Once installed, `agent-builder`\n\ntriggers when you set out to build, implement, or review **one** agent, while `agentic-product-architect`\n\ntriggers for multi-agent products, an agent loop, or any major agentic framework (LangGraph, CrewAI, OpenAI Agents SDK, Claude Agent SDK, Pydantic AI, AutoGen). Ask a focused question — *\"Mem0 or Zep?\"*, *\"how should I structure context?\"*, *\"review my agent code\"* — and the relevant sub-skill loads directly.\n\nThe standard tells you *how*; ** AgenticMind** is a repo you can\n\n*run*. It's the flagship reference implementation — an auditable, self-improving\n\n**knowledge & memory layer** that agentic products plug into over MCP (the OSS pick for the memory slot in\n\n[). The](/Moai-Team-LLC/agentic-product-standard/blob/main/skills/agentic-product-architect/memory-architecture)\n\n`memory-architecture`\n\n[flow above stands it up in the same run.](#quick-setup-one-train)\n\n`./setup.sh --with-agenticmind`\n\n| Repo | Use it when | |\n|---|---|---|\n| 📐 | agentic-product-standard (this repo) |\nYou're designing or building an agent / agentic product — the standard + skills tell you how. |\n| 🧠 |\n|\nYou need a knowledge & memory layer for your agent — a working implementation you can run. |\n\nSee the [ AgenticMind case study](/Moai-Team-LLC/agentic-product-standard/blob/main/examples/agenticmind-case-study.md) for a layer-by-layer map of how that repo implements this canon.\n\nNever start with \"build an agent.\" Start with *\"what is the minimum autonomy this task requires?\"* The cost of getting this wrong is asymmetric.\n\n| Level | What it is | Use when |\n|---|---|---|\nL0 · Single LLM call |\nOne prompt → one response | Classification, extraction, summarization |\nL1 · Augmented LLM |\n+ retrieval, + tools, + memory | Q&A over docs, simple assistants |\nL2 · Workflow |\nDeterministic code orchestrates LLM steps | Path is known; predictability matters |\nL3 · Orchestrator-Worker |\nLLM decomposes within a bounded graph | Parallelizable, breadth-first work |\nL4 · Autonomous Agent Loop |\nLLM chooses the next step until termination | Path cannot be enumerated; cost is acceptable |\n\nEscalation rule:do not climb to L+1 until L delivers≥90% pass rateon a curated eval set.\n\nCompose agentic products from these primitives *like Lego* — before reaching for a framework.\n\n**Prompt Chaining**— sequential decomposition (outline → draft → polish)** Routing**— classifier + dispatcher to a specialist** Parallelization**— fan-out of independent subtasks + aggregation** Orchestrator-Workers**— central planner + dynamic workers** Evaluator-Optimizer**— generator + critic in a loop until acceptance\n\n**Meta-principle:** first try to solve the task by composing these patterns in deterministic code. A full agent loop is the *last* resort.\n\nIn a production agent, the harness — everything *around* the LLM loop — is **98% of the code**.\n\n```\n╔═════════════════════════════════════════════╗\n║  8. Security & Identity  (CROSS-CUTTING)    ║ ← threat model · injection defense · agent identity · least-privilege tokens · pinned tool defs\n╠═════════════════════════════════════════════╣\n║   7. Observability & Tracing                ║ ← log EVERYTHING\n║   6. Evaluation Layer (CI gates)            ║ ← block regressions\n║   5. Human-in-the-Loop (notify/ask/review)  ║ ← approval gates\n║   4. Guardrails (input/output validation)   ║ ← defense in depth\n║   3. Durable Execution (Workflow + Activity)║ ← pause/resume/retry\n║   2. Context & Memory Management            ║ ← write/select/compress/isolate\n║   1. Agent Loop (gather → act → verify)     ║ ← the \"agent\" proper\n╚═════════════════════════════════════════════╝\n              ↕ MCP / function calling\n```\n\nPermission boundaries are enforced by code, never by prompt.The Replit incident of 2025 — an agent wiped a production database for 1,200+ companies despite an explicit \"code freeze\" in its prompt — is the canonical proof. The model will ignore prompt-level restrictions under enough pressure. Code won't.\n\nLayer 8 is cross-cutting (v2.0).Identity, least privilege, and isolation constrain every layer; injection defense spans input and output. Run thelethal-trifectacheck (private data × untrusted content × external comms) on every deployment, andpin MCP tool definitions by hashso a server can't rug-pull you. A guardrail is one tactic, not the discipline — see[and the]`STANDARD.md`\n\n· Layer 8[red-team kit].\n\nRun this before drafting any architecture. It unblocks 80% of design debates.\n\n```\n□ What is the minimum autonomy level (L0–L4) that solves this?\n□ Can it be solved by composing the 5 patterns without a full agent loop?\n□ Is the task breadth-first (parallelizable) or depth-first (coherent)?\n□ What are the 3 failure modes that would lose user trust first?\n□ Where are the permission boundaries? What MUST the agent NOT do?\n□ Which constraint dominates framework choice?\n□ Where does state live? (in-context = anti-pattern for long-running)\n□ Who validates outputs at each stage? (assertion / LLM judge / human review)\n□ Where do traces live, with what retention?\n□ Eval set: how many examples, who labels, how does it grow?\n```\n\nIf you can't answer half of these, **slow down and answer them together — don't write code yet.**\n\nPrinciples are easy to nod along to; makes you prove it. It turns the Definition of Done into a binary Yes/No maturity self-assessment with four bands mapped to the Autonomy Ladder:\n\n`SCORECARD.md`\n\n| Band | Autonomy | Means |\n|---|---|---|\nM0 · Prototype |\nL0–L1 | Works on a demo. No production claim. |\nM1 · Shippable |\nL2 | Contracts, schemas, guardrails, an eval set, permissions in code. |\nM2 · Production |\nL3 | Durable, observable, tenant-isolated, security-checked, cost-bounded, CI-gated. |\nM3 · Autonomous-ready |\nL4 | Online evals, `pass^k` reliability, red-team kit run, full OTel trajectory traces. |\n\nRun it with the team against a real deployment each release — the first **No** you hit is your next piece of work.\n\nAn agentic product is **not production-ready** until all **15** are satisfied. Full detail in [ STANDARD.md](/Moai-Team-LLC/agentic-product-standard/blob/main/STANDARD.md#part-iii-production-readiness--definition-of-done).\n\n| Context & state | Tools & security | Reliability | Evals & observability |\n|---|---|---|---|\n| Context < 40% | Destructive actions need approval | Durable pause/resume/retry | ≥50 evals per failure mode |\n| State externalized | Permissions in code, not prompt | Schema-validated outputs | Judges calibrated (TPR/TNR) |\n| Compaction tested | Sandboxed tool execution | Input/output guardrails | CI blocks regression; 100% traced |\n| — | Lethal-trifecta check; MCP tool defs pinned |\nPer-run cost ceiling in code |\nTrajectory + online evals |\n\nThe fastest way to recognize a doomed agent project — the skill set's `antipatterns-review`\n\nflags each with a diagnostic and a fix.\n\n- Multi-agent before a single-agent baseline\n- Framework abstractions before understanding the raw API\n- LLM judges without calibration against human labels\n- Permissions enforced through prompts\n- Memory as an afterthought\n- Generic evals (\"helpfulness,\" \"correctness\")\n- Likert scales in an LLM judge (binary only)\n-\n100 tools per agent\n\n- One agent for both breadth and depth\n- Deploying without trace monitoring\n- Hardcoded prompts without version control\n- Treating single-vendor benchmarks as ground truth\n- Trusting community MCP servers without pinning or scanning (rug pulls)\n- Deploying the lethal trifecta with no mitigation\n- Token passthrough / over-scoped OAuth (confused deputy)\n- No budget ceiling on autonomous sessions\n- Peer-to-peer multi-agent buses instead of an orchestrator\n\nThe operational base — not reference docs. Read in order:\n\n- Anthropic —\n*Building Effective Agents*(Schluntz & Zhang) - OpenAI —\n*A Practical Guide to Building Agents* - HumanLayer —\n*12 Factor Agents*(Dex Horthy) - Anthropic —\n*How we built our multi-agent research system* - Cognition —\n*Don't Build Multi-Agents*(Walden Yan) - LangChain —\n*Context Engineering for Agents*(Lance Martin) - Hamel Husain —\n*A Field Guide to Rapidly Improving AI Products*+*Your AI Product Needs Evals* - Anthropic —\n*Building agents with the Claude Agent SDK* - Anthropic —\n*Effective Context Engineering for AI Agents*(just-in-time retrieval) - OWASP —\n*Top 10 for Agentic Applications (2026)*+ Simon Willison —*The lethal trifecta* - OpenTelemetry —\n*GenAI semantic conventions*(the observability standard)\n\nThis standard is meant to evolve — the field moves fast. Corrections, new exemplars, framework updates, and translations are all welcome. See [CONTRIBUTING.md](/Moai-Team-LLC/agentic-product-standard/blob/main/CONTRIBUTING.md) and the [Code of Conduct](/Moai-Team-LLC/agentic-product-standard/blob/main/CODE_OF_CONDUCT.md).\n\nThe architectural canons (the autonomy ladder, the 5 patterns, single-vs-multi, the harness) are stable. Specific vendors and framework rankings will shift — those are exactly the kind of PRs we want.\n\n[MIT](/Moai-Team-LLC/agentic-product-standard/blob/main/LICENSE) — use it, fork it, ship with it.\n\n**If this saved you a week of architecture debates, star the repo ⭐ so others find it.**\n\n*v2.0 · assembled from production practices as of June 2026*", "url": "https://wpnews.pro/news/an-open-standard-for-production-agents-with-runnable-security-checks", "canonical_source": "https://github.com/Moai-Team-LLC/agentic-product-standard", "published_at": "2026-06-06 09:02:56+00:00", "updated_at": "2026-06-06 09:17:47.646041+00:00", "lang": "en", "topics": ["ai-agents", "ai-products", "ai-tools", "ai-safety", "ai-infrastructure"], "entities": ["Anthropic", "OpenAI", "Cognition", "Sierra", "LangChain", "Claude Code", "Moai Team LLC"], "alternates": {"html": "https://wpnews.pro/news/an-open-standard-for-production-agents-with-runnable-security-checks", "markdown": "https://wpnews.pro/news/an-open-standard-for-production-agents-with-runnable-security-checks.md", "text": "https://wpnews.pro/news/an-open-standard-for-production-agents-with-runnable-security-checks.txt", "jsonld": "https://wpnews.pro/news/an-open-standard-for-production-agents-with-runnable-security-checks.jsonld"}}