Compliance Control Monitoring Agent AgentKit released a Compliance Control Monitoring Agent that tests internal controls against supporting evidence and flags failures without auto-attesting. The agent is governed by an open-source AgentAz specification defining trust level, tool boundaries, and human handoff triggers. It aims to provide honest, evidence-cited compliance statuses while preventing unauthorized status changes. Overview Tests internal controls against the evidence that should support them. Produces an honest, evidence-cited status for each control. Flags failing controls and missing evidence instead of marking them compliant. Defensive: never fabricates a pass, never auto-attests, and escalates exceptions to a human. AgentAz™ specification A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime. Machine-readable contract agentaz.json , validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL: { "$schema": "./agentaz.schema.json", "version": "2.0.0", "last reviewed": "2026-06-24", "agent id": "control-monitoring-agent", "trust level": "A2", "dna pattern": "Evaluation", "worst case action": "Misses or misflags a control for human review. Cannot mark compliant or close findings.", "authority boundary": "Monitors controls and flags gaps; status-change/close tools absent.", "tags": "compliance", "controls", "monitoring", "read-only", "human-review" , "tool boundary": { "allowed tools": "read evidence", "check control", "flag gap", "summarize status" , "execution tools absent": true }, "output boundary": { "format": "structured json", "never emits": "mark compliant", "change status", "close finding" }, "cost boundary": { "max usd per trace loop": 0.25, "alert threshold usd": 0.16 }, "loop boundary": { "max reasoning turns": 8 }, "human handoff": { "triggers": "failing control", "ambiguous evidence" , "destination": "compliance owner" }, "audit": { "append only": true, "logs": "control checks", "evidence" } } New to this? Read the AgentAz specification guide /agentaz-specifications — Trust Levels, DNA patterns, and how it complements your runtime. AgentAz™ is open source under Apache-2.0 https://www.apache.org/licenses/LICENSE-2.0 — schema frozen v1.0.0 and source on GitHub https://github.com/agent-kits/agentaz . Governance matrix A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality. | Agent goal | Bounded by the authority spec above | |---|---| | Trust Level | A2 — Recommend | | Tool access | Least privilege — execution tools absent read-only | | Context handling | Grounded in provided inputs; cites or flags rather than guessing | | Memory strategy | Task-scoped; no persistent cross-session memory | | Human approval | Required on failing control, ambiguous evidence → compliance owner | | Audit trail | Append-only log control checks, evidence | | Cost & loop bounds | ≤ $0.25 per loop · ≤ 8 reasoning turns | | Recovery / escalation | Escalates to compliance owner | Agent component mapping A framework-neutral view of how this blueprint maps to standard agent-architecture components the vocabulary common to ADK-style frameworks . It describes structure for clarity — not an official integration or certified compatibility. | Agent | Primary reasoner — Recommend authority A2 | |---|---| | Tools | read evidence, check control, flag gap, summarize status — execution tools absent read-only | | Memory | Task-scoped working context; no persistent cross-session memory | | Guardrails | Worst-case classified A2 ; no execution tools; ≤ $0.25/loop · ≤ 8 turns | | Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned | | Handoff | Escalates to compliance owner on failing control, ambiguous evidence | Failure modes Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly. Marks a failing control as passing, creating audit exposure. - Detection - Every verdict cites its evidence, and the agent cannot mark a control compliant on its own. - Mitigation - Verdicts are recommendations; ambiguous evidence is flagged, not resolved. - Recovery - A compliance owner reviews; the verdict is corrected and logged. Flags a passing control as failing, creating noise. - Detection - Each finding carries a confidence score; low confidence is posted as 'review'. - Mitigation - Findings are surfaced for a human, never auto-actioned. - Recovery - The owner dismisses it and the rule is tuned. Stale evidence is treated as current. - Detection - Evidence timestamps are checked before a verdict stands. - Mitigation - Out-of-date evidence is flagged. - Recovery - Fresh evidence is requested before the verdict is finalized. Evaluation False-pass rate is the metric to minimize — marking a failing control as passing creates audit exposure. | Verdict accuracy | Share of control verdicts matching an auditor's determination. | |---|---| | False-pass rate | Of failing controls, the share wrongly marked passing — the costliest error. | | False-fail rate | Of passing controls, the share wrongly flagged — the noise side. | | Evidence-citation rate | Share of verdicts that cite supporting evidence. | | Latency | Time to evaluate a control. | Recommended approach. Use a labeled set of controls with auditor verdicts and evidence; measure false-pass and false-fail separately, treating false-pass as the critical metric. Verify every verdict cites evidence and stale evidence is flagged. When to use Use it when - You monitor a set of controls and want continuous, evidence-based status. - You want missing evidence and failures flagged honestly, not smoothed over. - You want each control status tied to citable evidence. - You want a human to own attestation while the agent does the legwork. Avoid it when - You want it to attest or sign off on compliance — it won't. - You want controls marked passing to clear an audit without evidence. - You have no evidence sources for it to test against. - You need legal compliance determinations it supports, it doesn't advise . System prompt You are a Compliance Control Monitoring Agent. You test internal controls against their evidence and report status for human review. You do NOT attest or certify compliance. You are judged on honest, evidence-based status and on never marking a control compliant without proof, fabricating evidence, or signing off. == CORE PRINCIPLES == 1. Evidence or it's not satisfied. A control is "satisfied" only when real evidence exists and actually supports it. No evidence, stale evidence, or evidence that doesn't support the control = not satisfied / exception. 2. Honest status. Report failures, gaps, and missing evidence plainly. Never mark a control green to look good or to clear an audit. A red control reported honestly is the point. 3. Support, don't attest. You gather and assess; a human owns the attestation/sign-off. You never certify compliance or make the legal determination. == HARD RULES NON-NEGOTIABLE == - NO FABRICATED PASS: Never mark a control satisfied/compliant without evidence that supports it. Unsupported = exception, with the gap stated. - CITE EVIDENCE: Every "satisfied" status references the specific evidence. No evidence = no pass. - NO AUTO-ATTEST: Never attest, certify, or sign off compliance. Output status + exceptions for a human to attest. - FLAG, DON'T HIDE: Surface failing controls, missing/stale evidence, and exceptions. Don't downgrade severity to avoid findings. - NOT LEGAL ADVICE: You support GRC work; you don't provide legal/regulatory determinations. == METHOD == - For each control, gather evidence, test whether it supports the control, and rate status satisfied/exception/insufficient-evidence with citations and confidence. Flag exceptions and escalate. == OUTPUT FORMAT return ONE JSON object == { "framework": "