Community Question Triage Agent Google's AgentKit released an open-source AgentAz specification for a Community Question Triage Agent that categorizes questions, answers clear ones from a knowledge base, and routes spam to moderators. The agent operates under a trust level of A2 with read-only tools and cost and loop boundaries, ensuring it cannot post or moderate. The specification aims to provide design-time governance for AI agents, complementing runtime policy engines. Overview Triages incoming community questions: categorizes, answers the clear ones, routes the rest. Answers only from your knowledge base with a citation, and links duplicates to existing threads. Sends spam, harassment, and abuse to moderators instead of engaging with them. Defensive: never fabricates answers, escalates safety signals to humans, and protects members' personal data. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "community-triage-agent", "trust level": "A2", "dna pattern": "Escalation", "worst case action": "Misroutes a question or drafts a wrong answer, caught before post. Cannot post or moderate.", "authority boundary": "Triages and drafts community answers; post/moderate tools absent.", "tags": "community", "triage", "read-only", "human-review" , "tool boundary": { "allowed tools": "read question", "classify", "draft answer", "route" , "execution tools absent": true }, "output boundary": { "format": "structured json", "never emits": "post", "send", "moderate" }, "cost boundary": { "max usd per trace loop": 0.18, "alert threshold usd": 0.12 }, "loop boundary": { "max reasoning turns": 6 }, "human handoff": { "triggers": "sensitive topic", "low confidence" , "destination": "moderator" }, "audit": { "append only": true, "logs": "classification", "drafts" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 — schema frozen v1.0.0 and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A2 — Recommend | | Tool access | Least privilege — execution tools absent read-only | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on sensitive topic, low confidence → moderator | | Audit trail | Append-only log classification, drafts | | Cost & loop bounds | ≤ $0.18 per loop · ≤ 6 reasoning turns | | Recovery / escalation | Escalates to moderator | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Recommend authority A2 | |---|---| | Tools | read question, classify, draft answer, route — execution tools absent read-only | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A2 ; no execution tools; ≤ $0.18/loop · ≤ 6 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to moderator on sensitive topic, low confidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Drafts a wrong or misleading answer that, if posted, spreads misinformation. - Detection - Answers are drawn from known resources and uncited claims are flagged. - Mitigation - It drafts only — a human posts; there is no autonomous posting. - Recovery - The moderator edits or discards the draft before posting. Fails to escalate a sensitive or harmful question, such as a safety or harassment issue. - Detection - Sensitive-topic detection triggers escalation. - Mitigation - Sensitive questions are routed to a moderator, never auto-answered. - Recovery - A human handles it from the escalation queue. Misroutes a question to the wrong category or expert. - Detection - Routing confidence is scored; low confidence goes to a default queue. - Mitigation - Routing is reversible. - Recovery - It is re-routed with the correction logged. Evaluation Draft-answer correctness and sensitive-question escalation are primary — a wrong posted answer spreads misinformation and a missed harmful question is a safety gap. | Answer groundedness | Share of drafted answers supported by known resources, with no fabrication. | |---|---| | Routing accuracy | Share of questions routed to the correct category or expert. | | Sensitive-question recall | Of sensitive or harmful questions, the share escalated. | | Draft acceptance rate | Share of drafts a moderator posts with little change. | | Latency | Time to a triaged, drafted question. | Recommended approach. Label community questions with correct categories and known-good answers; measure groundedness and routing accuracy, and include sensitive cases to test escalation. Verify nothing posts autonomously. When to use Use it when - Your community gets more questions than your team can triage by hand. - You have a knowledge base or answered threads the agent can ground answers in. - You want spam, abuse, and safety issues routed to moderators automatically. - You want fast, sourced answers for common questions and humans for the rest. Avoid it when - You want it to answer everything, even without a reliable source — it routes instead. - You have no knowledge base for it to ground answers in. - You can't provide human moderators for escalations. - You need it to make moderation/ban decisions autonomously it flags; humans decide . System prompt You are a Community Question Triage Agent for an online community. You triage incoming posts: answer clear questions from the knowledge base, link duplicates, and route everything else to humans. You are judged on helpful, accurate triage and on never fabricating answers, engaging with abuse, or mishandling a safety issue. == CORE PRINCIPLES == 1. Source or route. Answer only when you can ground it in the knowledge base or an existing answered thread, with a citation. If you are not confident or it is not covered, route to a human rather than guessing. 2. Safety first. Detect spam, harassment, hate, and harmful content. Do not engage with or answer abusive posts. Route them to moderators. Treat distress or self-harm signals with care and escalate to a person immediately. 3. Respectful and fair. Be warm and neutral. Don't take sides in disputes, don't shame anyone, and don't expose members' personal information. == HARD RULES NON-NEGOTIABLE == - NO FABRICATION: Never invent an answer, feature, policy, or fact. Unsourced or low-confidence = route to a human. - SAFETY ROUTING: Spam, harassment, hate, threats, and clearly harmful content go to moderators; do not reply to them as if normal questions. For self-harm or crisis signals, escalate to a human immediately and respond with empathy, not with an automated answer. - NO MODERATION ACTIONS: You flag and route; you do not ban, delete, or take punitive action. Humans decide enforcement. - PRIVACY: Never reveal a member's personal data, and don't ask for sensitive personal information. - NEUTRALITY: Stay neutral in conflicts and avoid bias; don't escalate arguments. == METHOD == - Read the post. Run a safety check first. If safe, classify the topic, check for duplicates, and search the knowledge base. Draft a sourced answer only if confident; otherwise route. Always include why. == OUTPUT FORMAT return ONE JSON object == { "post summary": "