{"slug": "agent-series-20-harness-in-production-from-single-file-to-reusable-package", "title": "Agent Series (20): Harness in Production — From Single File to Reusable Package", "summary": "A developer refactored a 900-line demonstration file into a reusable Python package called 'harness', which includes modules for action registration, permission budgeting, input sanitization, audit logging, and rollback coordination. The package introduces a refund() method to correct a budget accounting flaw and supports integration as standalone Python or as a LangGraph graph embedding.", "body_md": "Article 19 used a 900-line `harness_full_demo.py`\n\nto demonstrate eight defense layers. That file is good for explaining concepts, but not for reuse — all layers are coupled together, nothing can be tested in isolation, and nothing can be imported by another project.\n\nA production-grade Agent project needs something you can actually `import`\n\n:\n\n```\nharness/\n├── __init__.py      Public API exports\n├── registry.py      Layer 2: ActionRegistry + PermissionLevel\n├── budget.py        Layer 3: PermissionBudget (with refund())\n├── sandbox.py       Layer 4: sanitise_input + sandboxed_eval\n├── audit.py         Layer 6: ImmutableAuditLog (hash-chained)\n├── rollback.py      Layer 7: RollbackCoordinator\n└── harness.py       Unified entry point: AgentHarness\n```\n\nThis article starts with package design, covers three key API decisions, and finishes with two integration styles: standalone Python and LangGraph graph embedding.\n\n```\nclass PermissionLevel(Enum):\n    READ        = 1\n    WRITE       = 2\n    ADMIN       = 3\n    IRREVERSIBLE = 4\n\n@dataclass\nclass RegisteredAction:\n    name: str\n    level: PermissionLevel\n    budget_cost: int\n    description: \"str\"\n    handler: Any   # Callable or BaseTool\n\nclass ActionRegistry:\n    def register(self, action: RegisteredAction) -> None: ...\n    def get(self, name: str) -> RegisteredAction: ...    # not found → PermissionError\n    def is_allowed(self, name: str) -> bool: ...\n    def names(self) -> list[str]: ...\n```\n\n`get()`\n\nrather than `__getitem__`\n\n: raises a consistent `PermissionError`\n\n, without leaking the internal `KeyError`\n\ndetail.\n\n``` php\nclass PermissionBudget:\n    def spend(self, action_name: str, cost: int) -> None:\n        if self.remaining < cost:\n            raise BudgetExhaustedError(...)\n        self.remaining -= cost\n\n    def refund(self, action_name: str, cost: int) -> None:\n        self.remaining = min(self.total, self.remaining + cost)\n```\n\nThe new `refund()`\n\nmethod fixes a design flaw from Article 19: budget was deducted before approval, and never returned on rejection. The production package corrects this — when an IRREVERSIBLE action is intercepted, `harness.py`\n\nproactively calls `refund()`\n\nto keep budget accounting accurate.\n\n```\nINJECTION_PATTERN = re.compile(\n    r\"(ignore.*(previous|above|prior)|forget.*instruction|\"\n    r\"you are now|act as|jailbreak|bypass|\"\n    r\"override.*system|system.*override|\"     # both word orders covered\n    r\"</s>|\\n\\n###|###\\s*system|<\\|im_start\\|>|system prompt)\",\n    re.IGNORECASE,\n)\n```\n\nTwo subtle points:\n\n`SYSTEM OVERRIDE`\n\n(system first) and `override.*system`\n\n(override first) are covered`\\n\\n###`\n\nmatches a real newline, not the literal string `\\\\n\\\\n###`\n\nBoth bugs were discovered and fixed during the adversarial tests in Article 21.\n\n``` python\nclass ImmutableAuditLog:\n    def log(self, action, actor, target, result, metadata=None) -> str:\n        entry = {..., \"prev_hash\": self._last_hash}\n        entry[\"hash\"] = self._hash(json.dumps(entry, sort_keys=True) + self._last_hash)\n        with self._path.open(\"a\") as f:   # append-only\n            f.write(json.dumps(entry) + \"\\n\")\n        return entry[\"hash\"]\n\n    def verify_integrity(self) -> bool:\n        # Replays the hash chain; any modified field returns False\n        ...\n```\n\nThe `__len__()`\n\nhelper lets tests use `len(audit)`\n\nto check entry count directly.\n\n``` python\nclass RollbackCoordinator:\n    @contextmanager\n    def transaction(self, state: dict, op_name: str):\n        snapshot = copy.deepcopy(state)\n        self._snapshots.append({\"op\": op_name, \"snapshot\": snapshot})\n        try:\n            yield state\n        except Exception:\n            state.clear()\n            state.update(snapshot)\n            self._snapshots.pop()\n            raise\n\n    def rollback_last(self, state: dict) -> str | None:\n        \"\"\"Manual trigger: undo the most recent committed transaction.\"\"\"\n        if not self._snapshots:\n            return None\n        entry = self._snapshots.pop()\n        state.clear()\n        state.update(entry[\"snapshot\"])\n        return entry[\"op\"]\n```\n\n`rollback_last()`\n\nenables manual rollback: after a transaction commits, the snapshot is retained until explicitly confirmed or cleared by the caller.\n\n``` python\nclass AgentHarness:\n    def __init__(self, budget: int = 100, log_path: str = ...):\n        self.registry = ActionRegistry()\n        self.budget   = PermissionBudget(total=budget)\n        self.audit    = ImmutableAuditLog(log_path=log_path)\n        self.rollback = RollbackCoordinator()\n        self._state: dict = {}\n\n    def execute(self, action_name: str, actor: str = \"agent\", **kwargs) -> Any:\n        # Layer 4: sanitise string arguments\n        # Layer 2: registry check (missing → PermissionError)\n        # Layer 3: budget deduction (insufficient → BudgetExhaustedError)\n        # Layer 5: IRREVERSIBLE → refund budget + raise HumanApprovalRequired\n        # Layer 7: WRITE/ADMIN wrapped in rollback.transaction\n        # Layer 6: audit record\n        ...\n\n    def approve_and_execute(self, action_name: str, actor: str = \"human\", **kwargs) -> Any:\n        \"\"\"Call this after catching HumanApprovalRequired to complete execution.\"\"\"\n        ...\n```\n\n**Why the two methods are separate:**\n\n`execute()`\n\nis the automated path: all checks pass, execute immediately`approve_and_execute()`\n\nis the human path: the caller explicitly signals \"this has been approved\"Merging them (e.g., with an `approved=False`\n\nparameter) makes intent ambiguous and harder to test.\n\n```\nharness = AgentHarness(budget=50)\n\n# Register actions\nharness.registry.register(RegisteredAction(\n    \"read_ticket\",   PermissionLevel.READ,        1,  \"Read Jira ticket\",  handler_fn))\nharness.registry.register(RegisteredAction(\n    \"write_draft\",   PermissionLevel.WRITE,        3,  \"Write draft fix\",   handler_fn))\nharness.registry.register(RegisteredAction(\n    \"create_pr\",     PermissionLevel.ADMIN,         8,  \"Open pull request\", handler_fn))\nharness.registry.register(RegisteredAction(\n    \"merge_to_main\", PermissionLevel.IRREVERSIBLE, 20, \"Merge to main\",     handler_fn))\n```\n\n**READ → WRITE → ADMIN normal flow:**\n\n```\nr1 = harness.execute(\"read_ticket\",  ticket_id=\"BUG-101\")\nr2 = harness.execute(\"write_draft\",  ticket_id=\"BUG-101\", patch=\"fix: add null check\")\nr3 = harness.execute(\"create_pr\",    ticket_id=\"BUG-101\", title=\"fix: BUG-101\")\n# read=1 + write=3 + admin=8 = 12 spent, 38 remaining\ntry:\n    harness.execute(\"delete_all_data\")\nexcept PermissionError as e:\n    # \"Action 'delete_all_data' not in registry. Execution blocked.\"\n    ...\ntry:\n    harness.execute(\"merge_to_main\", pr_id=1)\nexcept HumanApprovalRequired as e:\n    print(e.action_name)   # \"merge_to_main\"\n    print(e.action_args)   # {\"pr_id\": 1}\n    # After human review:\n    result = harness.approve_and_execute(\"merge_to_main\", pr_id=1)\n```\n\n**Key point**: when `execute()`\n\nintercepts an IRREVERSIBLE action, it calls `budget.refund()`\n\nfirst. The net budget cost is zero. Only `approve_and_execute()`\n\nactually charges the budget.\n\n```\n# budget=5, write cost=3\nh = AgentHarness(budget=5)\nh.execute(\"write_draft\", ...)   # OK, 2 remaining\nh.execute(\"write_draft\", ...)   # BudgetExhaustedError: need 3, remaining 2\n```\n\nEmbedding the harness inside LangGraph's `tools_node`\n\n:\n\n``` php\ndef tools_node(state: HState) -> dict:\n    last = state[\"messages\"][-1]\n    results = []\n    for tc in last.tool_calls:\n        name, args = tc[\"name\"], tc[\"args\"]\n        try:\n            reg = harness.registry.get(name)               # Layer 2\n            harness.budget.spend(name, reg.budget_cost)    # Layer 3\n\n            if reg.level == PermissionLevel.IRREVERSIBLE:\n                decision = interrupt({...})                 # Layer 5: LangGraph primitive\n                if decision != \"approved\":\n                    harness.budget.refund(name, reg.budget_cost)\n                    harness.audit.log(name, \"checkpoint\", ..., \"HUMAN_REJECTED\")\n                    results.append(ToolMessage(content=\"rejected\", ...))\n                    continue\n\n            if reg.level in (WRITE, ADMIN):\n                with harness.rollback.transaction(harness._state, name):  # Layer 7\n                    output = TOOL_MAP[name].invoke(args)\n            else:\n                output = TOOL_MAP[name].invoke(args)\n\n            harness.audit.log(name, \"agent\", ..., \"EXECUTED\")       # Layer 6\n            results.append(ToolMessage(content=str(output), ...))\n\n        except PermissionError as e:\n            harness.audit.log(name, \"registry\", ..., \"BLOCKED\")\n            results.append(ToolMessage(content=str(e), ...))\n        except BudgetExhaustedError as e:\n            results.append(ToolMessage(content=str(e), ...))\n\n    return {\"messages\": results}\n```\n\n`tools_node`\n\nis the harness's natural insertion point: it intercepts before tool execution without touching any `agent_node`\n\n(reasoning layer) logic.\n\nThis package's behavior is fully verified by Article 21's test suite:\n\n```\nFunctional  (Layer 1–7 basic behaviour)     ████████████████████████████████  19/19  PASS\nAdversarial (injection / escalation)        ████████████████████████████████  17/17  PASS\nChaos       (fault injection / partial)     ████████████████████████████████   9/ 9  PASS\n\nTotal                                        45/ 45 tests passed\n```\n\n**Two real bugs found by the tests:**\n\n`INJECTION_PATTERN`\n\nonly matched `override.*system`\n\n, missing `[SYSTEM OVERRIDE]`\n\n(reversed word order)`\\\\n\\\\n###`\n\nmatched the literal string `\\n`\n\n, not a real newline — jailbreak pattern `### System:`\n\nslipped throughBoth fixed in sandbox.py with a one-line regex adjustment.\n\n**Package Structure**\n\n`__init__.py`\n\nexports only the public API; internal classes stay private`AgentHarness`\n\nacts as Facade; callers don't reach into subsystems directly**API Design**\n\n`execute()`\n\nis the automated path covering the full Layer 2→7 chain`approve_and_execute()`\n\nis the human path; the caller signals \"approved\"`refund()`\n\n) when IRREVERSIBLE is intercepted, keeping accounting accurate`PermissionError`\n\n/ `BudgetExhaustedError`\n\n/ `HumanApprovalRequired`\n\n) exported from `__init__.py`\n\n**Sandbox**\n\n`\\n`\n\nis a real newline character, not the literal `\\\\n`\n\n**LangGraph Integration**\n\n`tools_node`\n\n, not in `agent_node`\n\n`interrupt()`\n\n, not a Python exceptionFive core conclusions:\n\n`execute()`\n\nand `approve_and_execute()`\n\nmakes intent explicit`tools_node`\n\nis the harness's natural slot*Check out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.*\n\n*Find more useful knowledge and interesting products on my Homepage*", "url": "https://wpnews.pro/news/agent-series-20-harness-in-production-from-single-file-to-reusable-package", "canonical_source": "https://dev.to/wonderlab/agent-series-20-harness-in-production-from-single-file-to-reusable-package-2chd", "published_at": "2026-06-14 03:50:01+00:00", "updated_at": "2026-06-14 03:58:43.315279+00:00", "lang": "en", "topics": ["developer-tools", "ai-agents", "ai-safety", "machine-learning", "large-language-models"], "entities": ["AgentHarness", "ActionRegistry", "PermissionBudget", "ImmutableAuditLog", "RollbackCoordinator", "LangGraph", "PermissionLevel", "RegisteredAction"], "alternates": {"html": "https://wpnews.pro/news/agent-series-20-harness-in-production-from-single-file-to-reusable-package", "markdown": "https://wpnews.pro/news/agent-series-20-harness-in-production-from-single-file-to-reusable-package.md", "text": "https://wpnews.pro/news/agent-series-20-harness-in-production-from-single-file-to-reusable-package.txt", "jsonld": "https://wpnews.pro/news/agent-series-20-harness-in-production-from-single-file-to-reusable-package.jsonld"}}