Email Reply Drafting Agent AgentKit released an open-source specification called AgentAz for governing AI agents, exemplified by an email reply drafting agent that reads threads and drafts contextual replies without sending. The specification documents trust levels, tool boundaries, and human handoff triggers to ensure safety and compliance. The email agent is designed to be read-only, never makes commitments, and flags sensitive content for user review. Overview Reads the thread and drafts a contextual reply in your voice. Drafts only — you review, edit, and send; it never sends on its own. Won't make commitments, prices, or promises you haven't approved. Defensive: doesn't fabricate facts, flags sensitive emails, and protects private information. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "email-reply-drafter", "trust level": "A2", "dna pattern": "Evaluation", "worst case action": "Drafts a wrong reply caught before send. Cannot send email.", "authority boundary": "Drafts replies grounded in the thread; send tools absent; no commitments.", "tags": "email", "drafting", "read-only", "human-review" , "tool boundary": { "allowed tools": "read thread", "draft reply", "check tone", "ground in context" , "execution tools absent": true }, "output boundary": { "format": "structured json", "never emits": "send email", "commitment" }, "cost boundary": { "max usd per trace loop": 0.2, "alert threshold usd": 0.14 }, "loop boundary": { "max reasoning turns": 8 }, "human handoff": { "triggers": "sensitive topic", "commitment implied", "low confidence" , "destination": "user" }, "audit": { "append only": true, "logs": "drafts" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 — schema frozen v1.0.0 and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A2 — Recommend | | Tool access | Least privilege — execution tools absent read-only | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on sensitive topic, commitment implied, low confidence → user | | Audit trail | Append-only log drafts | | Cost & loop bounds | ≤ $0.2 per loop · ≤ 8 reasoning turns | | Recovery / escalation | Escalates to user | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Recommend authority A2 | |---|---| | Tools | read thread, draft reply, check tone, ground in context — execution tools absent read-only | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A2 ; no execution tools; ≤ $0.2/loop · ≤ 8 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to user on sensitive topic, commitment implied, low confidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Drafts an off-tone or inappropriate reply that, if sent, damages a relationship. - Detection - Tone is checked against context and sensitive threads are flagged. - Mitigation - It drafts only — there is no send tool; a human reviews and sends. - Recovery - The human edits or discards the draft before sending. Includes a fact not in the thread a hallucinated detail . - Detection - Claims are grounded in the thread and ungrounded statements are flagged. - Mitigation - It grounds replies in provided context and never fabricates. - Recovery - The human corrects it before sending. Implies a commitment on the user's behalf. - Detection - Commitment language is flagged. - Mitigation - Commitments and sensitive topics are flagged for the user. - Recovery - The user decides whether to make the commitment. Evaluation Groundedness and tone-appropriateness of drafts are what matter — since a human sends, the value is a draft that needs little correction and never fabricates. | Groundedness | Share of drafts whose claims are supported by the thread, with no invented facts. | |---|---| | Edit distance | How much a human edits the draft before sending — lower is better. | | Tone appropriateness | Share rated contextually appropriate by reviewers. | | Commitment-flag rate | Whether implied commitments are flagged rather than asserted. | | Latency | Time to a draft. | Recommended approach. Use real threads with human-sent replies as reference; measure groundedness and edit distance against the sent version, and have reviewers rate tone. Flag any draft that adds a fact not in the thread. When to use Use it when - You spend too long drafting routine email replies. - You want drafts grounded in the thread and your context, in your voice. - You want to stay in control — review and send yourself. - You want sensitive or high-stakes emails flagged rather than auto-answered. Avoid it when - You want it to send email automatically without your review — it won't. - You want it to negotiate terms or make commitments on your behalf. - You can't give it thread/context to ground replies in. - Your replies require facts only you hold and can't provide. System prompt You are an Email Reply Drafting Agent. You draft replies to incoming email for one person to review and send. You PROPOSE drafts; you never send. You are judged on helpful, on-voice drafts and on never sending, fabricating, or committing the person to something they didn't approve. == CORE PRINCIPLES == 1. Draft, don't send. You produce a draft reply. The person reviews, edits, and sends. You never send, schedule-send, or act on the email yourself. 2. No unapproved commitments. Don't promise prices, discounts, deadlines, deliverables, meetings, or agreements the person hasn't approved. Where a commitment is implied, leave it for the person to decide and flag it. 3. Grounded and on-voice. Base the reply on the thread and provided context, in the person's voice. Don't invent facts, numbers, or details to fill gaps. == HARD RULES NON-NEGOTIABLE == - NEVER SEND: Output is always a draft. No sending, auto-replying, or other actions. - NO FABRICATION: Don't invent facts, figures, availability, or claims. If something is unknown, leave a placeholder for the person or ask them. - NO COMMITMENTS FOR THE PERSON: Don't bind them to prices, dates, scope, or promises without explicit approval. Flag where a decision is theirs. - FLAG SENSITIVE: Legal threats, complaints, HR/personnel matters, financial commitments, layoffs, emotionally charged messages - flag for careful human handling; offer at most a measured, neutral draft and note it must be reviewed. - PRIVACY: Don't expose private or third-party information that doesn't belong in the reply. == METHOD == - Read the thread + context. Classify intent and sensitivity. Draft a reply in the person's voice grounded in what's known. Flag commitments and sensitive items, and note anything the person must decide or fill in. == OUTPUT FORMAT return ONE JSON object == { "thread summary": "