# Compliance Evidence & Audit-Trail Agent

> Source: <https://www.agent-kits.com/kit/compliance-evidence-agent>
> Published: 2026-06-29 22:03:04.177241+00:00

## Overview

Audit readiness is the quiet blocker on shipping agents — and on a lot of routine engineering — into regulated environments. The work itself is rarely hard; the evidence that it was done correctly, mapped to the right control, and preserved unaltered is what takes weeks before an audit. This agent automates the evidence layer, not the judgement: it observes an event, maps it to the applicable controls in a catalog you supply, gathers the artifacts that control requires, and seals the result into a verifiable log.

The design is deliberately read-heavy with a single gated write. It holds no tools that can modify source systems, grant access, or alter a prior audit entry — those are absent from its registry, not merely discouraged. The only consequential action, committing an evidence record, sits behind a human sign-off gate. That keeps the agent at Trust Level A3: the worst it can do is prepare an incomplete or mis-mapped record that a person reviews before it is filed.

Crucially, the agent does not ship an opinion about what ISO 42001 or SOC 2 require. It maps against the control catalog your compliance team configures, so the mapping is grounded in your own authoritative source rather than a model's guess. It assembles and preserves evidence; your compliance owner and your auditor still make the compliance call.

## AgentAz™ Specifications

A lightweight, design-time governance spec for security review. It documents what this agent is authorized to do — and why — and pairs with whatever policy engine you already run. It does not enforce anything at runtime.

### Governance readiness

Machine-readable contract (`agentaz.json`

), validated against the open AgentAz™ JSON Schema — bundled for offline use and published at a permanent URL:

```
{
  "$schema": "./agentaz.schema.json",
  "version": "1.0.0",
  "last_reviewed": "2026-06-30",
  "agent_id": "compliance-evidence-agent",
  "trust_level": "A3",
  "dna_pattern": "Planning",
  "worst_case_action": "Prepares an incomplete or mis-mapped evidence record for human approval. Cannot file autonomously, modify source systems, or alter prior audit entries.",
  "authority_boundary": "Maps events to a configured control catalog and assembles evidence for approval; source-system write, access-grant, and entry-mutation tools are absent.",
  "tags": [
    "compliance",
    "audit-trail",
    "evidence",
    "human-approval",
    "tamper-evident"
  ],
  "tool_boundary": {
    "auto_executable_tools": [
      "get_event",
      "resolve_actor",
      "classify_resource",
      "lookup_controls",
      "collect_artifacts",
      "score_completeness"
    ],
    "approval_required_tools": [
      "request_signoff",
      "commit_evidence"
    ],
    "execution_tools_absent": false,
    "rollback_required": true
  },
  "output_boundary": {
    "format": "structured_json",
    "never_emits": [
      "source_system_write",
      "grant_access",
      "audit_entry_mutation"
    ]
  },
  "cost_boundary": {
    "max_usd_per_trace_loop": 0.3,
    "alert_threshold_usd": 0.2
  },
  "loop_boundary": {
    "max_reasoning_turns": 8
  },
  "human_handoff": {
    "triggers": [
      "incomplete_evidence",
      "regulated_resource",
      "low_mapping_confidence"
    ],
    "destination": "compliance_owner"
  },
  "audit": {
    "append_only": true,
    "tamper_evident": "hmac_chain",
    "logs": [
      "event",
      "mapping",
      "evidence",
      "completeness_score",
      "approvals"
    ]
  }
}
```

Building your own agent? Paste its prompt or spec into the [AgentAz Compliance Scanner](/scan) to grade it against this same rubric — or scroll up to [run this blueprint live](#run-live) with your own API key.

New to this? Read the [AgentAz™ Specifications guide](/agentaz-specifications) — Trust Levels, DNA patterns, and how it complements your runtime.

AgentAz™ is open source under [Apache-2.0](https://www.apache.org/licenses/LICENSE-2.0) — schema (frozen v1.0.0) and source on [GitHub](https://github.com/agent-kits/agentaz).

## Governance matrix

A scannable summary of this blueprint's governance coverage, derived from its AgentAz™ specification. It documents the boundaries that already ship — not new functionality.

| Agent goal | Bounded by the authority spec above |
|---|---|
| Trust Level | A3 — Human-Approved |
| Tool access | Scoped tools; high-risk actions gated behind approval |
| Context handling | Grounded in provided inputs; cites or flags rather than guessing |
| Memory strategy | Task-scoped; no persistent cross-session memory |
| Human approval | Required on incomplete evidence, regulated resource, low mapping confidence → compliance owner |
| Audit trail | Append-only log (event, mapping, evidence, completeness score, approvals) |
| Cost & loop bounds | ≤ $0.3 per loop · ≤ 8 reasoning turns |
| Recovery / escalation | Escalates to compliance owner |

## Agent component mapping

A framework-neutral view of how this blueprint maps to standard agent-architecture components (the vocabulary common to ADK-style frameworks). It describes structure for clarity — not an official integration or certified compatibility.

| Agent | Primary reasoner — Human-Approved authority (A3) |
|---|---|
| Tools | get event, resolve actor, classify resource, lookup controls, collect artifacts, score completeness; approval-gated: request signoff, commit evidence |
| Memory | Task-scoped working context; no persistent cross-session memory |
| Guardrails | Worst-case classified (A3); high-risk actions gated; ≤ $0.3/loop · ≤ 8 turns |
| Evaluator | Confidence and authority-boundary checks; low-confidence or out-of-bounds results are flagged, not actioned |
| Handoff | Escalates to compliance owner on incomplete evidence, regulated resource, low mapping confidence |

## Failure modes

Specific ways this blueprint can fail, and how it is designed to detect, contain, and recover from each — the boundaries that make it safe to run, stated plainly.

Mis-maps an event to the wrong control — a false 'compliant' stamp.

- Detection
- Mapping confidence is scored against the catalog; low-confidence or multi-candidate matches are flagged rather than committed.
- Mitigation
- The agent maps only to controls present in the catalog and escalates ambiguous mappings to a human with the candidates shown.
- Recovery
- The human confirms or corrects the mapping; the corrected decision and its basis are logged for future calibration.

Files an incomplete evidence record that looks complete.

- Detection
- Completeness is scored against each control's required-artifact threshold; missing artifacts are itemized explicitly.
- Mitigation
- Below-threshold completeness — or any regulated resource — blocks the sign-off gate; the record cannot be filed automatically.
- Recovery
- The gap list is returned to the owner; once the missing artifacts are supplied, the record is re-scored and re-routed.

An audit entry is altered, reordered, or deleted after filing.

- Detection
- The append-only log is HMAC-chained; verification recomputes the chain and fails on any mutation, reorder, or truncation.
- Mitigation
- The chain head is pinned in separate storage so even tail-truncation is detectable; verification runs before any export.
- Recovery
- A failed verification quarantines the affected range and the record is re-derived from source artifacts under a fresh sealed entry.

## Evaluation

Mapping correctness and evidence completeness together — a confidently wrong 'evidenced' record is the silent risk, so the agent is judged on never filing an incomplete or mis-mapped record, not just on throughput.

| Control-mapping accuracy | Agreement of the agent's event-to-control mapping with a compliance reviewer's mapping on a labeled set. |
|---|---|
| Evidence completeness rate | Share of filed records meeting every mapped control's required-artifact threshold. |
| False-compliance rate | Rate of records filed as complete that a reviewer judges incomplete or mis-mapped — the metric to drive toward zero. |
| Escalation rate | Share of events correctly routed to a human for ambiguous mapping, incompleteness, or regulated sensitivity. |
| Tamper-detection | Share of injected log mutations caught by chain verification — expected to be 100%. |

**Recommended approach.** Replay labeled events with known-correct control mappings and required-artifact sets, measuring mapping accuracy and completeness. Separately, adversarially inject tampered, reordered, and truncated log entries and confirm verification catches every one before any record is trusted.

## When to use

Use it when

- You need a durable, queryable trail showing that changes (access, config, deploys, data handling) were performed and reviewed correctly.
- You already maintain — or can define — a control catalog and want each operational event mapped to it automatically.
- You want tamper-evident evidence (an append-only, verifiable log) rather than screenshots in a folder.
- You're preparing for an ISO 42001, SOC 2, or internal audit and the evidence-gathering is the bottleneck.

Avoid it when

- You want an agent that makes you compliant or signs off findings autonomously — this agent records and routes; humans and auditors decide.
- You have no control catalog and no one to define one; the agent maps against your controls, it does not invent authoritative ones.
- You need it to remediate or change source systems — it has no write access to anything but its own sealed evidence log, behind approval.

## System prompt

```
You are a Compliance Evidence agent. Your job is to turn a single operational event into a structured, well-mapped, tamper-evident evidence record — and to escalate rather than guess whenever the mapping or the evidence is uncertain. You do not decide whether the organization is compliant; you assemble and preserve the evidence a human compliance owner and an auditor will judge.

For each event you receive (an access grant, configuration change, deployment, or data-handling action):

1. Establish the facts. Read the event and resolve who acted, on what resource, and when. Classify the resource's sensitivity (e.g. public, internal, confidential, regulated). Never infer facts you cannot ground in the event or a tool result.

2. Map to controls. Look up the applicable controls for this event type and resource class in the configured control catalog. Map only to controls the catalog actually contains — never to a control you believe 'should' apply but cannot find. If the mapping is ambiguous or the catalog returns nothing confident, flag it for human review.

3. Assemble evidence. For each mapped control, collect the artifacts that control requires (approver identity, ticket reference, policy version, diff, timestamps). Record what is present and, explicitly, what is missing.

4. Score completeness and residual risk. Compute how completely the required evidence was gathered against each control's threshold. If completeness is below the control's threshold, or the resource is regulated, or your mapping confidence is low, you must escalate — do not file.

5. Gate the filing. Committing an evidence record is a consequential action. You may not commit it yourself: you call request_signoff to route the prepared record to the named human compliance owner, and only after explicit approval may commit_evidence run. If approval is denied or times out, the record is not filed and the reason is logged.

6. Seal and explain. Once approved, the record is appended to the tamper-evident audit log and you produce a plain-language 'what changed and why it is evidenced' summary that a non-engineer reviewer can read.

Hard rules: you have no tool that can modify a source system, grant or revoke access, or alter a prior audit entry — if a task seems to need one, escalate. Prefer 'I could not confidently map this' over a confident wrong mapping; a false 'compliant' record is the most damaging output you can produce. Log every mapping decision, its basis, and its confidence.
```

## Simulate run

Try the agent with a sample task. This is a frontend-only preview that shows how the kit would plan and execute — no API calls, nothing leaves your browser.

Frontend preview only — no data leaves your browser. Tip: press `⌘/Ctrl` + `Enter` to run.

## Run it live

The scripted preview above is canned. This runs the **real** agent loop — the kit's actual system prompt and tools — against the model, using **your own API key**. Tools are still mocks, and high-risk tools are blocked by the same runtime gate the `run.py`

demo enforces. Your key stays in your browser.

Runs the real agent loop in your browser against 8 mock tools. Your key stays in your browser — calls go straight to the provider, never to us.

## Setup guide

Define a starter control catalog

List the event types you want evidenced and, for each, the controls they map to and the artifacts each control requires. Start with ISO 42001 and SOC 2 Common Criteria entries you can cite; keep it small and owned.

Connect read-only artifact sources

Give collect_artifacts read access to the systems that hold approvers, tickets, policy versions, and diffs. Read-only — the agent never needs write access to source systems.

Wire the sign-off gate

Point request_signoff at where your compliance owners review (Slack/email/queue). Ensure denial or timeout leaves the record unfiled.

Provision the audit log key

Set the HMAC key for the append-only log from a secret, and decide where you pin the chain head so tail-truncation is detectable.

Validate before trusting

Replay known events with known-correct mappings and confirm the agent reproduces them; inject a tampered log entry and confirm verification fails. Only then route real events.

## Architecture

- Event intakeReceives a single operational event (access grant, config change, deployment, or data-handling action) with its actor, resource, and timestamp.
- Context & sensitivity resolutionResolves the acting identity and classifies the affected resource's sensitivity tier, which determines how strict the evidence requirements are.
- Control mappingLooks up the applicable controls for this event type and resource class in the configured control catalog. Low-confidence or empty mappings are flagged, not forced.
- Evidence assemblyFor each mapped control, collects the artifacts it requires (approver, ticket, policy version, diff, timestamps) and records present-vs-missing items explicitly.
- Completeness & residual-risk scoringScores evidence completeness against each control's threshold and computes residual risk from sensitivity and mapping confidence.
- Approval & sign-off gategateA human compliance owner reviews the prepared record. Filing is blocked until explicit sign-off; incomplete, regulated, or low-confidence records cannot pass automatically.
- Commit & sealgateOn approval, appends the record to the append-only, HMAC-chained audit log and emits a plain-language summary of what changed and why it is evidenced.

## Tools required

## Workflow

1. Establish the facts

Read the event, resolve the acting identity, and classify the resource's sensitivity. Ground every fact in the event or a tool result — no inference.

2. Map to the control catalog

Look up applicable controls for this event type and resource class. Map only to controls the catalog contains; flag ambiguous or empty mappings for human review rather than guessing.

3. Assemble the evidence

For each mapped control, collect the required artifacts and record explicitly what is present and what is missing.

4. Score completeness and residual risk

Compute completeness against each control's threshold. Below threshold, regulated resource, or low mapping confidence forces an escalation.

5. Gate the filing

Call request_signoff to route the prepared record to the human compliance owner. commit_evidence only runs after explicit approval; denial or timeout means the record is not filed and the reason is logged.

6. Seal the record

On approval, append to the append-only, HMAC-chained audit log so the entry is tamper-evident and the chain can be verified later.

7. Explain it

Produce a plain-language 'what changed and why it is evidenced' summary a non-engineer reviewer can read and trust.

## Examples

Access grant, fully evidenced

An engineer is granted access to an internal analytics dashboard via the normal ticketed process.

Input

```
Event: access_grant · actor: e.lee (engineer) · resource: analytics-dashboard (internal) · approver ticket: ACC-4821 · policy: access-v7
```

Output

```
Mapped to access-control + change-approval controls. All required artifacts present (approver, ticket, policy version, timestamps). Completeness 100%. Routed for sign-off; on approval, sealed to the audit log with a summary: 'Internal dashboard access granted to e.lee under ACC-4821, approved per access-v7.'
```

**Note:** The happy path: a clean, complete record assembled and sealed behind one human approval.

Regulated resource, escalated — not filed

A config change touches a system classified as regulated, but the change ticket is missing a required approver reference.

Input

```
Event: config_change · resource: billing-pii-store (regulated) · ticket: present · approver: MISSING · policy: data-handling-v3
```

Output

```
Mapped to data-handling + change-approval controls. Completeness 60% — required approver artifact missing — and resource is regulated. The agent does NOT file. It escalates to the compliance owner with the specific gap itemized: 'Approver reference absent; regulated resource; cannot evidence change-approval control.'
```

**Note:** The important case: incompleteness on a regulated resource blocks the gate. A false 'evidenced' record is exactly what the design refuses to produce.

Ambiguous mapping, flagged for review

A deployment event could plausibly map to two different controls and the catalog returns a low-confidence match.

Input

```
Event: deployment · resource: shared-inference-service (confidential) · catalog match confidence: 0.41
```

Output

```
The agent reports it could not confidently map the event and routes it to a human with both candidate controls and the evidence gathered so far, rather than committing a guess. No record is filed until the mapping is confirmed.
```

**Note:** Prefer 'I could not confidently map this' over a confident wrong mapping — the system prompt's core rule, shown in action.

## Implementation notes

- The control catalog is the source of truth, and it is yours. lookup_controls reads the controls your compliance team defines; the agent never asserts a mapping to a control it cannot find in your catalog. This keeps mapping correctness grounded in your authoritative source, not a model's recollection of a standard.
- Start the catalog small and real. ISO 42001 and the SOC 2 Common Criteria are the most agent-relevant starting points; map a handful of high-frequency event types first and expand. Treat the catalog as a living document with an owner, not a frozen file.
- The audit log is append-only and HMAC-chained: every entry is signed over its own fields and the previous entry's signature, so any later mutation, reorder, or deletion fails verification. Pin the latest signature (the chain head) in separate storage to detect tail truncation.
- commit_evidence and request_signoff are the only approval-gated tools, and they are the only tools that write anything. There is intentionally no tool that can modify a source system, grant access, or alter a prior entry — the absence is the safety property.
- Wire request_signoff to wherever your compliance owners already work (Slack, email, a review queue). A denied or timed-out approval must leave the record unfiled, not filed-by-default.
- The agent records and preserves; it does not opine on compliance. Keep that boundary visible in any UI so reviewers and auditors understand a human still makes the determination.

## Variations

Basic

Single-event evidencer

Maps one event type (e.g. access grants) against a small control catalog, assembles evidence, and writes a verifiable log entry behind manual sign-off. The fastest way to replace a screenshots-in-a-folder process.

Advanced

Multi-source with completeness gating

Covers access, config, and deploy events; pulls artifacts from several connected systems; enforces per-control completeness thresholds; and escalates regulated or low-confidence records automatically.

Enterprise

Continuous evidence pipeline

Subscribes to an event stream, maintains the chain head in external storage for tamper-evidence at scale, routes sign-offs to control owners by domain, and exposes a verifiable, queryable evidence store for auditors.

Download the Agent Blueprint

[Download Blueprint (.zip)](/downloads/compliance-evidence-agent.zip)

[View the source on GitHub](https://github.com/agent-kits/agentaz/tree/main/kits/compliance-evidence-agent)

This blueprint and the AgentAz™ specification live in the central AgentKits registry — open source under Apache-2.0 (code & schema) and CC‑BY‑4.0 (text).

## Frequently asked questions

No. It assembles and preserves evidence and routes anything uncertain to a human. Your compliance owner and your auditor make the compliance determination. The agent's value is a tamper-evident, well-mapped trail, not a verdict.

From a control catalog you configure. The agent maps events only to controls your catalog contains, so the mappings are grounded in your authoritative source. It will not assert a mapping to a control it cannot find.

Each entry is HMAC-signed over its own fields and the previous entry's signature, forming a chain. Any later mutation, reorder, or deletion makes verification fail. Pinning the latest signature externally also detects truncation of the tail.

No. It has no tool that modifies a source system, grants or revokes access, or alters a prior audit entry. Its only writes are committing evidence to its own log, and that is behind human approval.

Any you put in the catalog. ISO 42001 and SOC 2 Common Criteria are sensible starting points, but the agent is framework-agnostic — it maps to whatever controls and artifacts you define.

Prepare an incomplete or mis-mapped evidence record that a human reviews before it is filed. It cannot file a record autonomously, and it cannot touch source systems — which is what keeps it at Trust Level A3.
