Article 19 used a 900-line harness_full_demo.py
to demonstrate eight defense layers. That file is good for explaining concepts, but not for reuse — all layers are coupled together, nothing can be tested in isolation, and nothing can be imported by another project.
A production-grade Agent project needs something you can actually import
:
harness/
├── __init__.py Public API exports
├── registry.py Layer 2: ActionRegistry + PermissionLevel
├── budget.py Layer 3: PermissionBudget (with refund())
├── sandbox.py Layer 4: sanitise_input + sandboxed_eval
├── audit.py Layer 6: ImmutableAuditLog (hash-chained)
├── rollback.py Layer 7: RollbackCoordinator
└── harness.py Unified entry point: AgentHarness
This article starts with package design, covers three key API decisions, and finishes with two integration styles: standalone Python and LangGraph graph embedding.
class PermissionLevel(Enum):
READ = 1
WRITE = 2
ADMIN = 3
IRREVERSIBLE = 4
@dataclass
class RegisteredAction:
name: str
level: PermissionLevel
budget_cost: int
description: "str"
handler: Any # Callable or BaseTool
class ActionRegistry:
def register(self, action: RegisteredAction) -> None: ...
def get(self, name: str) -> RegisteredAction: ... # not found → PermissionError
def is_allowed(self, name: str) -> bool: ...
def names(self) -> list[str]: ...
get()
rather than __getitem__
: raises a consistent PermissionError
, without leaking the internal KeyError
detail.
class PermissionBudget:
def spend(self, action_name: str, cost: int) -> None:
if self.remaining < cost:
raise BudgetExhaustedError(...)
self.remaining -= cost
def refund(self, action_name: str, cost: int) -> None:
self.remaining = min(self.total, self.remaining + cost)
The new refund()
method fixes a design flaw from Article 19: budget was deducted before approval, and never returned on rejection. The production package corrects this — when an IRREVERSIBLE action is intercepted, harness.py
proactively calls refund()
to keep budget accounting accurate.
INJECTION_PATTERN = re.compile(
r"(ignore.*(previous|above|prior)|forget.*instruction|"
r"you are now|act as|jailbreak|bypass|"
r"override.*system|system.*override|" # both word orders covered
r"</s>|\n\n###|###\s*system|<\|im_start\|>|system prompt)",
re.IGNORECASE,
)
Two subtle points:
SYSTEM OVERRIDE
(system first) and override.*system
(override first) are covered\n\n###
matches a real newline, not the literal string \\n\\n###
Both bugs were discovered and fixed during the adversarial tests in Article 21.
class ImmutableAuditLog:
def log(self, action, actor, target, result, metadata=None) -> str:
entry = {..., "prev_hash": self._last_hash}
entry["hash"] = self._hash(json.dumps(entry, sort_keys=True) + self._last_hash)
with self._path.open("a") as f: # append-only
f.write(json.dumps(entry) + "\n")
return entry["hash"]
def verify_integrity(self) -> bool:
...
The __len__()
helper lets tests use len(audit)
to check entry count directly.
class RollbackCoordinator:
@contextmanager
def transaction(self, state: dict, op_name: str):
snapshot = copy.deepcopy(state)
self._snapshots.append({"op": op_name, "snapshot": snapshot})
try:
yield state
except Exception:
state.clear()
state.update(snapshot)
self._snapshots.pop()
raise
def rollback_last(self, state: dict) -> str | None:
"""Manual trigger: undo the most recent committed transaction."""
if not self._snapshots:
return None
entry = self._snapshots.pop()
state.clear()
state.update(entry["snapshot"])
return entry["op"]
rollback_last()
enables manual rollback: after a transaction commits, the snapshot is retained until explicitly confirmed or cleared by the caller.
class AgentHarness:
def __init__(self, budget: int = 100, log_path: str = ...):
self.registry = ActionRegistry()
self.budget = PermissionBudget(total=budget)
self.audit = ImmutableAuditLog(log_path=log_path)
self.rollback = RollbackCoordinator()
self._state: dict = {}
def execute(self, action_name: str, actor: str = "agent", **kwargs) -> Any:
...
def approve_and_execute(self, action_name: str, actor: str = "human", **kwargs) -> Any:
"""Call this after catching HumanApprovalRequired to complete execution."""
...
Why the two methods are separate:
execute()
is the automated path: all checks pass, execute immediatelyapprove_and_execute()
is the human path: the caller explicitly signals "this has been approved"Merging them (e.g., with an approved=False
parameter) makes intent ambiguous and harder to test.
harness = AgentHarness(budget=50)
harness.registry.register(RegisteredAction(
"read_ticket", PermissionLevel.READ, 1, "Read Jira ticket", handler_fn))
harness.registry.register(RegisteredAction(
"write_draft", PermissionLevel.WRITE, 3, "Write draft fix", handler_fn))
harness.registry.register(RegisteredAction(
"create_pr", PermissionLevel.ADMIN, 8, "Open pull request", handler_fn))
harness.registry.register(RegisteredAction(
"merge_to_main", PermissionLevel.IRREVERSIBLE, 20, "Merge to main", handler_fn))
READ → WRITE → ADMIN normal flow:
r1 = harness.execute("read_ticket", ticket_id="BUG-101")
r2 = harness.execute("write_draft", ticket_id="BUG-101", patch="fix: add null check")
r3 = harness.execute("create_pr", ticket_id="BUG-101", title="fix: BUG-101")
try:
harness.execute("delete_all_data")
except PermissionError as e:
...
try:
harness.execute("merge_to_main", pr_id=1)
except HumanApprovalRequired as e:
print(e.action_name) # "merge_to_main"
print(e.action_args) # {"pr_id": 1}
result = harness.approve_and_execute("merge_to_main", pr_id=1)
Key point: when execute()
intercepts an IRREVERSIBLE action, it calls budget.refund()
first. The net budget cost is zero. Only approve_and_execute()
actually charges the budget.
h = AgentHarness(budget=5)
h.execute("write_draft", ...) # OK, 2 remaining
h.execute("write_draft", ...) # BudgetExhaustedError: need 3, remaining 2
Embedding the harness inside LangGraph's tools_node
:
def tools_node(state: HState) -> dict:
last = state["messages"][-1]
results = []
for tc in last.tool_calls:
name, args = tc["name"], tc["args"]
try:
reg = harness.registry.get(name) # Layer 2
harness.budget.spend(name, reg.budget_cost) # Layer 3
if reg.level == PermissionLevel.IRREVERSIBLE:
decision = interrupt({...}) # Layer 5: LangGraph primitive
if decision != "approved":
harness.budget.refund(name, reg.budget_cost)
harness.audit.log(name, "checkpoint", ..., "HUMAN_REJECTED")
results.append(ToolMessage(content="rejected", ...))
continue
if reg.level in (WRITE, ADMIN):
with harness.rollback.transaction(harness._state, name): # Layer 7
output = TOOL_MAP[name].invoke(args)
else:
output = TOOL_MAP[name].invoke(args)
harness.audit.log(name, "agent", ..., "EXECUTED") # Layer 6
results.append(ToolMessage(content=str(output), ...))
except PermissionError as e:
harness.audit.log(name, "registry", ..., "BLOCKED")
results.append(ToolMessage(content=str(e), ...))
except BudgetExhaustedError as e:
results.append(ToolMessage(content=str(e), ...))
return {"messages": results}
tools_node
is the harness's natural insertion point: it intercepts before tool execution without touching any agent_node
(reasoning layer) logic.
This package's behavior is fully verified by Article 21's test suite:
Functional (Layer 1–7 basic behaviour) ████████████████████████████████ 19/19 PASS
Adversarial (injection / escalation) ████████████████████████████████ 17/17 PASS
Chaos (fault injection / partial) ████████████████████████████████ 9/ 9 PASS
Total 45/ 45 tests passed
Two real bugs found by the tests:
INJECTION_PATTERN
only matched override.*system
, missing [SYSTEM OVERRIDE]
(reversed word order)\\n\\n###
matched the literal string \n
, not a real newline — jailbreak pattern ### System:
slipped throughBoth fixed in sandbox.py with a one-line regex adjustment.
Package Structure
__init__.py
exports only the public API; internal classes stay privateAgentHarness
acts as Facade; callers don't reach into subsystems directlyAPI Design
execute()
is the automated path covering the full Layer 2→7 chainapprove_and_execute()
is the human path; the caller signals "approved"refund()
) when IRREVERSIBLE is intercepted, keeping accounting accuratePermissionError
/ BudgetExhaustedError
/ HumanApprovalRequired
) exported from __init__.py
Sandbox
\n
is a real newline character, not the literal \\n
LangGraph Integration
tools_node
, not in agent_node
interrupt()
, not a Python exceptionFive core conclusions:
execute()
and approve_and_execute()
makes intent explicittools_node
is the harness's natural slotCheck out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.
Find more useful knowledge and interesting products on my Homepage