{"slug": "hermes-agent-needs-a-flight-recorder-so-i-built-one", "title": "Hermes Agent Needs a Flight Recorder - So I Built One", "summary": "A developer built TraceGuard, a lightweight Python library and CLI that functions as an execution flight recorder for autonomous agent runtimes. The tool detects three common failure modes in agentic workflows—retry storms, silent failures, and recursive delegation loops—by consuming append-only JSONL execution traces. TraceGuard transforms opaque terminal output into structured, replayable execution events, addressing the observability gap that currently leaves agent failures invisible and costly.", "body_md": "*This is a submission for the Hermes Agent Challenge*\n\nAutonomous agents can now write code, call tools, browse the web, mutate files, and delegate to subagents. But when they fail, they fail invisibly.\n\n\"An agent ran overnight, caught an unhandled exception loop, and burned $50 in tokens while corrupting our staging database.\"\n\nIf you've spent more than a week building production systems with autonomous agents, you've lived some version of this nightmare.\n\nMost agent runtimes don't crash cleanly. They slide into retry storms, silently ignore failed tool calls, or recurse through delegation loops until budgets evaporate.\n\nAirplanes have flight recorders. Distributed systems have OpenTelemetry. **Autonomous agents need TraceGuard.**\n\n**TraceGuard** is a lightweight Python library and CLI that acts as an isolated, non-invasive execution flight recorder for autonomous agent runtimes.\n\nIt consumes append-only JSONL execution traces and detects the three silent killers of agentic workflows:\n\n```\ntraceguard traces/my_agent_run.jsonl --strict\n# exit 0 = clean · exit 1 = WARN · exit 2 = CRITICAL\n```\n\nInstead of scraping human-readable terminal logs, TraceGuard turns runtime execution into a structured, replayable execution event contract.\n\n**GitHub:** [https://github.com/Ale007XD/traceguard](https://github.com/Ale007XD/traceguard)\n\nModern agent frameworks can browse the web, write files, execute shell commands, and coordinate sub-agents. But when something goes wrong, you're usually left with a giant wall of terminal output and one impossible question:\n\n**What actually happened?**\n\nNot what the LLM said. Not the final output. The actual execution state:\n\nDistributed systems engineers solved these problems decades ago using structured traces, append-only logs, and replayable execution histories. Agent runtimes are now complex enough to require the same discipline.\n\nAutonomous agents are stochastic distributed runtimes.\n\n| Distributed System Failure | Agent Equivalent | Observability Primitive |\n|---|---|---|\n| Retry storm | Same tool called repeatedly without progress | Sliding window counter over event stream |\n| Silent failure | Tool fails, agent continues anyway | Error propagation trace |\n| Circular dependency | Agent A delegates to B which delegates back to A | Delegation cycle detection |\n| State divergence | Agent acts on corrupted or stale state | Replayable transition history |\n\n```\nAgent Runtime\n      │\n      ▼\nAppend-Only Event Stream\n      │\n      ▼\n  TraceGuard\n      │\n  ┌───┴───────┬──────────────┐\n  ▼           ▼              ▼\nRetry      Silent       Recursive\nStorms    Failures     Delegation\n```\n\nEvery execution step becomes a formal state transition. The runtime stops being an opaque, ephemeral process and becomes a replayable execution artifact.\n\nHermes Agent currently exposes beautiful terminal output optimized for humans. Production observability requires something fundamentally different: machine-readable execution semantics.\n\nExample event:\n\n```\n{\n  \"event_id\": \"3f8a1c2d-...\",\n  \"session_id\": \"hermes-session-001\",\n  \"timestamp\": \"2026-05-29T10:00:00.050Z\",\n  \"schema_version\": \"1.0\",\n  \"type\": \"tool_call\",\n  \"tool_name\": \"bash\",\n  \"tool_args\": {\n    \"command\": \"git status --porcelain\"\n  }\n}\n```\n\nEach event is:\n\nThe missing primitive is not another dashboard. It is a structured execution event stream.\n\nDetects identical tool invocations repeating without successful progress.\n\n**Example:** `bash → fail`\n\n→ `bash → fail`\n\n→ `bash → fail`\n\n(retry storm)\n\nDetects agents continuing execution after failed or empty tool outputs.\n\n**Example:** `read_file → empty`\n\n→ `continue execution`\n\n(silent corruption)\n\nDetects sub-agent delegation cycles and self-recursion.\n\n**Example:** `planner → coder → coder → planner`\n\n(recursive loop)\n\nEach detector operates independently over the same append-only event stream. Multiple detectors can fire simultaneously on the same execution trace.\n\nTraceGuard is intentionally designed as an external execution observer.\n\n```\nLLM proposes\n      │\n      ▼\nRuntime executes\n      │\n      ▼\nTraceGuard observes\n      │\n      ▼\nGovernance layer enforces invariants\n```\n\nThis is the critical distinction. Prompt engineering cannot reliably solve retry storms, hidden execution corruption, or delegation cycles. Prompt-layer control is insufficient. **Execution-layer governance is required.**\n\n`schema.py`\n\n)`recorder.py`\n\n)`detectors.py`\n\n)`guard.py`\n\n)The core invariant is simple: **Record every transition. Analyze the record.**\n\nOnce execution becomes replayable, agent runtimes stop behaving like black boxes.\n\nHermes Agent currently produces terminal output optimized for human inspection. TraceGuard proposes a complementary execution event contract — a machine-readable stream of typed, versioned, append-only events emitted alongside the human-readable output.\n\nThis aligns with the discussion in [issue #169](https://github.com/NousResearch/hermes-agent/issues/169) on structured execution semantics.\n\nThe integration path is additive: TraceGuard requires no changes to Hermes internals. Emit events to a JSONL file; TraceGuard reads them externally.\n\n``` bash\n$ traceguard traces/retry_storm.jsonl\n[WARN] RetryStormDetector: tool 'bash' called 4 times without success (threshold=3)\n[WARN] SilentFailureDetector: step 2 failed, execution continued without error handling\n[WARN] SilentFailureDetector: step 4 failed, execution continued without error handling\n[WARN] SilentFailureDetector: step 6 failed, execution continued without error handling\n[WARN] SilentFailureDetector: step 7 failed, execution continued without error handling\n\n$ traceguard traces/recursive_delegation.jsonl\n[CRITICAL] RecursiveDelegationDetector: delegation cycle detected — planner → coder → planner\n\n$ traceguard traces/clean.jsonl\n✓ No anomalies detected.\n\n$ traceguard traces/retry_storm.jsonl --strict; echo \"exit: $?\"\nexit: 1\npython\nfrom traceguard import TraceGuard\n\nguard = TraceGuard()\nreport = guard.analyze(\"traces/my_agent_run.jsonl\")\n\nfor anomaly in report.anomalies:\n    print(f\"[{anomaly.severity}] {anomaly.detector}: {anomaly.message}\")\n\nif report.is_clean:\n    print(\"✓ No anomalies detected.\")\n```\n\n`frozen=True`\n\nevent modelsNo external runtime dependencies. No framework lock-in.\n\nTraceGuard was developed and iterated with Hermes Agent as the primary development environment — reading files, applying patches, running tests, and diagnosing failures through FSM-structured execution loops.\n\nThe irony is deliberate: a tool for governing agent execution traces was built by an agent whose execution was governed by the same FSM principles.\n\nHermes drove: reading source files → generating S&R patches → applying changes → running pytest → diagnosing failures → iterating.\n\nMost failures in autonomous systems are not model failures. They are execution failures:\n\nThe model is usually doing exactly what it was asked to do. The runtime simply lacks governance.\n\n\"LLMs propose. Runtimes govern.\"\n\nTraceGuard is to autonomous agents what OpenTelemetry became for distributed systems.\n\nBuilt for the **Hermes Agent Challenge 2026**.\n\n**Repository:** [https://github.com/Ale007XD/traceguard](https://github.com/Ale007XD/traceguard)\n\nBuilt on [llm-nano-vm](https://github.com/Ale007XD/nano_vm) — deterministic FSM execution infrastructure.", "url": "https://wpnews.pro/news/hermes-agent-needs-a-flight-recorder-so-i-built-one", "canonical_source": "https://dev.to/ale007xd/hermes-agent-needs-a-flight-recorder-so-i-built-one-3gea", "published_at": "2026-05-29 11:01:26+00:00", "updated_at": "2026-05-29 11:11:44.405974+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "ai-safety", "ai-infrastructure", "mlops"], "entities": ["TraceGuard", "Hermes Agent", "OpenTelemetry", "GitHub"], "alternates": {"html": "https://wpnews.pro/news/hermes-agent-needs-a-flight-recorder-so-i-built-one", "markdown": "https://wpnews.pro/news/hermes-agent-needs-a-flight-recorder-so-i-built-one.md", "text": "https://wpnews.pro/news/hermes-agent-needs-a-flight-recorder-so-i-built-one.txt", "jsonld": "https://wpnews.pro/news/hermes-agent-needs-a-flight-recorder-so-i-built-one.jsonld"}}