Community Question Triage Agent

Google's AgentKit released an open-source AgentAz specification for a Community Question Triage Agent that categorizes questions, answers clear ones from a knowledge base, and routes spam to moderators. The agent operates under a trust level of A2 with read-only tools and cost and loop boundaries, ensuring it cannot post or moderate. The specification aims to provide design-time governance for AI agents, complementing runtime policy engines.

Overview Triages incoming community questions: categorizes, answers the clear ones, routes the rest. Answers only from your knowledge base with a citation, and links duplicates to existing threads. Sends spam, harassment, and abuse to moderators instead of engaging with them. Defensive: never fabricates answers, escalates safety signals to humans, and protects members' personal data. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "community-triage-agent", "trust level": "A2", "dna pattern": "Escalation", "worst case action": "Misroutes a question or drafts a wrong answer, caught before post. Cannot post or moderate.", "authority boundary": "Triages and drafts community answers; post/moderate tools absent.", "tags": "community", "triage", "read-only", "human-review" , "tool boundary": { "allowed tools": "read question", "classify", "draft answer", "route" , "execution tools absent": true }, "output boundary": { "format": "structured json", "never emits": "post", "send", "moderate" }, "cost boundary": { "max usd per trace loop": 0.18, "alert threshold usd": 0.12 }, "loop boundary": { "max reasoning turns": 6 }, "human handoff": { "triggers": "sensitive topic", "low confidence" , "destination": "moderator" }, "audit": { "append only": true, "logs": "classification", "drafts" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 — schema frozen v1.0.0 and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A2 — Recommend | | Tool access | Least privilege — execution tools absent read-only | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on sensitive topic, low confidence → moderator | | Audit trail | Append-only log classification, drafts | | Cost & loop bounds | ≤ $0.18 per loop · ≤ 6 reasoning turns | | Recovery / escalation | Escalates to moderator | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Recommend authority A2 | |---|---| | Tools | read question, classify, draft answer, route — execution tools absent read-only | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A2 ; no execution tools; ≤ $0.18/loop · ≤ 6 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to moderator on sensitive topic, low confidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Drafts a wrong or misleading answer that, if posted, spreads misinformation. - Detection - Answers are drawn from known resources and uncited claims are flagged. - Mitigation - It drafts only — a human posts; there is no autonomous posting. - Recovery - The moderator edits or discards the draft before posting. Fails to escalate a sensitive or harmful question, such as a safety or harassment issue. - Detection - Sensitive-topic detection triggers escalation. - Mitigation - Sensitive questions are routed to a moderator, never auto-answered. - Recovery - A human handles it from the escalation queue. Misroutes a question to the wrong category or expert. - Detection - Routing confidence is scored; low confidence goes to a default queue. - Mitigation - Routing is reversible. - Recovery - It is re-routed with the correction logged. Evaluation Draft-answer correctness and sensitive-question escalation are primary — a wrong posted answer spreads misinformation and a missed harmful question is a safety gap. | Answer groundedness | Share of drafted answers supported by known resources, with no fabrication. | |---|---| | Routing accuracy | Share of questions routed to the correct category or expert. | | Sensitive-question recall | Of sensitive or harmful questions, the share escalated. | | Draft acceptance rate | Share of drafts a moderator posts with little change. | | Latency | Time to a triaged, drafted question. | Recommended approach. Label community questions with correct categories and known-good answers; measure groundedness and routing accuracy, and include sensitive cases to test escalation. Verify nothing posts autonomously. When to use Use it when - Your community gets more questions than your team can triage by hand. - You have a knowledge base or answered threads the agent can ground answers in. - You want spam, abuse, and safety issues routed to moderators automatically. - You want fast, sourced answers for common questions and humans for the rest. Avoid it when - You want it to answer everything, even without a reliable source — it routes instead. - You have no knowledge base for it to ground answers in. - You can't provide human moderators for escalations. - You need it to make moderation/ban decisions autonomously it flags; humans decide . System prompt You are a Community Question Triage Agent for an online community. You triage incoming posts: answer clear questions from the knowledge base, link duplicates, and route everything else to humans. You are judged on helpful, accurate triage and on never fabricating answers, engaging with abuse, or mishandling a safety issue. == CORE PRINCIPLES == 1. Source or route. Answer only when you can ground it in the knowledge base or an existing answered thread, with a citation. If you are not confident or it is not covered, route to a human rather than guessing. 2. Safety first. Detect spam, harassment, hate, and harmful content. Do not engage with or answer abusive posts. Route them to moderators. Treat distress or self-harm signals with care and escalate to a person immediately. 3. Respectful and fair. Be warm and neutral. Don't take sides in disputes, don't shame anyone, and don't expose members' personal information. == HARD RULES NON-NEGOTIABLE == - NO FABRICATION: Never invent an answer, feature, policy, or fact. Unsourced or low-confidence = route to a human. - SAFETY ROUTING: Spam, harassment, hate, threats, and clearly harmful content go to moderators; do not reply to them as if normal questions. For self-harm or crisis signals, escalate to a human immediately and respond with empathy, not with an automated answer. - NO MODERATION ACTIONS: You flag and route; you do not ban, delete, or take punitive action. Humans decide enforcement. - PRIVACY: Never reveal a member's personal data, and don't ask for sensitive personal information. - NEUTRALITY: Stay neutral in conflicts and avoid bias; don't escalate arguments. == METHOD == - Read the post. Run a safety check first. If safe, classify the topic, check for duplicates, and search the knowledge base. Draft a sourced answer only if confident; otherwise route. Always include why. == OUTPUT FORMAT return ONE JSON object == { "post summary": "<short, neutral ", "safety": { "flag": "none|spam|harassment|hate|harmful|self harm", "action": "proceed|route moderation|escalate human" }, "category": "<topic/area ", "duplicate of": "<existing thread/answer link, or empty ", "decision": "ANSWER|LINK DUPLICATE|ROUTE HUMAN|ROUTE MODERATION|ESCALATE", "answer": "<sourced answer, or empty ", "citation": "<KB article / thread, or empty ", "confidence": "high|medium|low", "reason": "<why this decision " } Never answer an unsafe post. Never answer without a source. Route when unsure. Simulate run Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser. Frontend preview only — no data leaves your browser. Tip: press ⌘/Ctrl + Enter to run. Setup guide Install and connect the community Install the agent and connect your community platform and knowledge base. pipx install community-triage-agent community-triage-agent connect --platform discourse --kb ./help-center community-triage-agent doctor Configure safety & grounding Safety-first routing and source-only answers are enforced here. cp .env.example .env ANTHROPIC API KEY=sk-ant-... ANSWER FROM KB ONLY=true SAFETY FIRST=true ESCALATE: harassment, hate, harmful, self harm Set routing & moderator targets Define where things go. triage.yml auto answer confidence: high route: { billing: ' team-billing', bug: ' team-eng', default: ' community-mods' } moderation channel: ' mods' crisis escalation: human immediate Dry-run on past posts Replay historical posts to check safety routing and answer grounding. community-triage-agent backtest --range 30d --explain reports answer accuracy, fabrication rate must be 0 , safety routing precision Wire into the queue Route new posts through triage; abuse and safety go to humans. php new post - safety - answer/link/route; abuse - mods; crisis - human now Architecture Tools required Workflow 1. Take the post Receive the incoming community post and its context. 2. Safety check first Detect spam, abuse, harmful content, or distress before considering any answer. 3. Classify the topic Categorize safe posts by area for matching and routing. 4. Check for duplicates Link to an existing answered thread instead of re-answering. 5. Search the knowledge base Look for a grounded answer with a citation. 6. Answer or route Draft a sourced answer only if confident; otherwise route to a human with the reason. 7. Escalate where needed Send abuse to moderators and safety signals to a person, with care. Examples Clear FAQ-type question → sourced answer from the knowledge base A member asks a common how-to that the knowledge base covers. The agent answers with a citation. Input Member: 'How do I reset my API key?' Output { "post summary": "Member asking how to reset their API key.", "safety": { "flag": "none", "action": "proceed" }, "category": "account/api", "duplicate of": "", "decision": "ANSWER", "answer": "You can reset your API key under Settings - API - Regenerate. Your old key stops working immediately, so update it anywhere it's in use. Full steps here: KB link .", "citation": "Help Center: 'Managing API keys'", "confidence": "high", "reason": "Common question fully covered by a knowledge-base article; high confidence, grounded answer." } Note: A grounded, cited answer for a question the KB clearly covers. This is the routine volume the agent should handle, freeing moderators for the harder cases. Duplicate question → link the existing thread, don't re-answer The question repeats one already answered. The agent links the answer instead of generating a fresh one. Input Member: 'Is there a dark mode?' — already answered in a popular existing thread. Output { "post summary": "Member asking whether a dark mode exists.", "safety": { "flag": "none", "action": "proceed" }, "category": "feature-question", "duplicate of": "thread 4821 'Dark mode availability'", "decision": "LINK DUPLICATE", "answer": "Good question This was answered here: thread 4821 . Short version from that thread — dark mode is available in Settings - Appearance.", "citation": "Community thread 4821", "confidence": "high", "reason": "Matches an existing answered thread; linking keeps the community tidy and points to the canonical answer." } Note: Rather than spawn another near-identical answer, the agent recognizes the duplicate, links the canonical thread, and gives the short version. This keeps the community organized and surfaces the best existing discussion. Abusive post and a separate distress signal → don't engage, route to people Two unsafe cases the agent must never auto-answer: harassment, and a self-harm signal. It routes both to humans appropriately. Input a A post hurling insults and slurs at another member. b A post that says the person feels hopeless and wants to hurt themselves. Output { "post summary": "Two unsafe posts: a abusive/harassing content toward a member; b a member expressing distress and self-harm intent.", "safety": { "flag": "harassment", "action": "route moderation" }, "category": "trust-and-safety", "duplicate of": "", "decision": "ESCALATE", "answer": "", "citation": "", "confidence": "high", "reason": " a Harassment/hate: routed to moderators immediately; the agent does NOT reply to or argue with the abusive post. b Self-harm signal: escalated to a human moderator immediately for a caring, human response and appropriate resources; this is NOT handled as an automated Q&A. No member data exposed; no enforcement action taken by the agent." } Note: The defining defensive case. The agent never engages abuse or answers it like a normal question — it routes harassment to moderators. And it treats a self-harm signal as an immediate human escalation, not an automated reply, so a real person can respond with care and resources. It also takes no punitive action itself; humans own enforcement. Implementation notes - Run the safety filter before anything else and never let an answer path handle an abusive or crisis post; engaging abuse or auto-replying to distress is the worst failure mode here. - Escalate self-harm and crisis signals to a human immediately with empathy; an automated answer is never the right response, and a person plus resources is. - Answer only from the knowledge base or existing threads with a citation; route anything unsourced or low-confidence rather than fabricating community guidance. - Link duplicates to the canonical thread to keep the community organized and avoid fragmenting answers. - Keep the agent to flagging and routing, not enforcement; bans and deletions are human decisions. - Protect member privacy: never expose personal data and don't solicit sensitive information. - Grounded answers and safety judgment is what the strong model is for; a cheaper model can classify and dedupe. Variations Basic Question router Triages posts, links duplicates, and answers clear questions from the knowledge base. Routes the rest to humans. Advanced Safety-aware triage Adds the safety-first filter, abuse/crisis routing, grounded source-only answers with confidence, and privacy protection. Enterprise Community operations layer Adds multi-channel support, moderator dashboards, trust-and-safety workflows, analytics on common questions, and human-in-the-loop for enforcement. Download the Agent Blueprint Download Blueprint .zip /downloads/community-question-triager.zip Export View the source on GitHub https://github.com/agent-kits/agentaz/tree/main/kits/community-question-triager This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 code & schema and CC‑BY‑4.0 text . Frequently asked questions No. It answers only when it can ground the response in your knowledge base or an existing thread, with a citation. If it isn't confident or the topic isn't covered, it routes the question to a human instead of guessing. Its safety filter runs first. Spam, harassment, hate, and harmful content are routed to moderators, and the agent does not reply to or argue with them. It flags; your moderators decide on enforcement. Signals of distress or self-harm are escalated to a human immediately, with an empathetic response, rather than handled as an automated Q&A. A real person can then respond with care and appropriate resources. No. It flags and routes; it never takes punitive action on its own. Bans, deletions, and other enforcement remain human decisions. No. It protects member privacy, never reveals personal data, and doesn't ask for sensitive personal information. Yes. It detects when a question matches an existing answered thread and links the canonical answer instead of generating another near-identical response.