{"slug": "promptetheus-trace-detect-and-auto-repair-ai-agent-failures", "title": "Promptetheus – Trace, detect, and auto-repair AI agent failures", "summary": "Promptetheus, a new debugging infrastructure for AI agents, launched with a Python SDK, local replay tooling, and hosted trace delivery. The tool enables developers to trace, detect, and auto-repair agent failures through decorators, typed events, and durable delivery that never crashes the host agent. Promptetheus aims to simplify debugging for coding agents by providing local CLI tools and hosted MCP evidence access.", "body_md": "Promptetheus is debugging infrastructure for AI agents: a Python SDK, local replay tooling, hosted trace delivery, and MCP evidence access for coding agents that need to fix failing agent runs.\n\n- One trace per user-visible agent task.\n- Decorators for top-level agent runs, tool calls, and nested spans.\n- Typed events for user messages, agent messages, tool calls, browser actions, DOM snapshots, screenshots, LLM calls, retrieval, metrics, errors, scores, and final goal checks.\n- Durable delivery that never crashes the host agent. If HTTP delivery is not configured or fails, events spool locally and can be replayed later.\n- Local CLI tools for doctor checks, spool inspection, session replay, diffing, and failure fingerprints.\n- Hosted MCP config snippets for read-only incident evidence scoped to a workspace and Supabase project.\n\nFor a normal project, install from PyPI:\n\n```\npip install promptetheus\npromptetheus version\n```\n\nCreate or configure a hosted project key:\n\n```\nexport PROMPTETHEUS_CONSOLE_TOKEN=...\npromptetheus init \\\n  --workspace-name \"Acme\" \\\n  --project-name \"Browser Agent\" \\\n  --write-env .env\nsource .env\npromptetheus doctor\n```\n\nFor local self-hosted development:\n\n```\npromptetheus init \\\n  --api-url http://127.0.0.1:4318 \\\n  --console-token pt_console_token \\\n  --write-env .env\nsource .env\n```\n\nFor contributor work from this repository:\n\n```\npip install -e packages/promptetheus\npromptetheus version\n```\n\nWith `transport=\"auto\"`\n\n, the SDK sends to the configured API when\n`PROMPTETHEUS_API_KEY`\n\nis present. Without a key, it writes to the local spool\nso the instrumented agent keeps running.\n\nUse decorators when you want instrumentation to sit directly on agent and tool functions:\n\n``` php\nimport promptetheus as pt\n\n@pt.tool\ndef search_calendar(day: str) -> list[str]:\n    return [\"Tuesday 2pm\", \"Tuesday 3pm\"]\n\n@pt.traced(\"choose-slot\")\ndef choose_slot(slots: list[str]) -> str:\n    return \"Wednesday 2pm\"\n\n@pt.observe(\n    agent=\"calendar-agent\",\n    user_goal=\"Book Tuesday at 2pm\",\n    transport=\"auto\",  # use \"spool\" to force local JSONL while trying this\n)\ndef run_agent(goal: str) -> str:\n    pt.current().user_message(goal)\n    slots = search_calendar(\"Tuesday\")\n    selected = choose_slot(slots)\n    pt.current().agent_message(f\"Booked {selected}\")\n    pt.current().goal_check(\n        False,\n        mismatches=[\"selected Wednesday, not Tuesday\"],\n    )\n    return selected\n\nrun_agent(\"Book Tuesday at 2pm\")\n```\n\nWhat each decorator does:\n\n`@pt.observe(...)`\n\nstarts one trace/session around the top-level run.`@pt.tool`\n\nrecords`tool_call`\n\nand`tool_result`\n\nevents inside the current session.`@pt.traced(\"name\")`\n\nadds a nested span to the replay tree without starting a separate session.`pt.current()`\n\nreturns the active session so the agent can record user messages, agent messages, goal checks, errors, metrics, and other events.\n\n`goal_check(False)`\n\nis visible in replay, fingerprints, and tail sampling. If a\nfailed goal should also make the process fail, record the goal check and then\nraise an exception so the terminal `session_end`\n\nstatus is `failed`\n\n:\n\n```\nif not selected.startswith(\"Tuesday\"):\n    pt.current().goal_check(False, mismatches=[\"selected Wednesday\"])\n    raise RuntimeError(\"agent selected the wrong day\")\n```\n\nWhen no API key is configured, `transport=\"auto\"`\n\nwrites local JSONL. While\nlearning, you can also pass `transport=\"spool\"`\n\nto force local output. After a\nlocal or spooled run, list sessions:\n\n```\npromptetheus sessions\n```\n\nExample output:\n\n```\n  01KVMZ4T7V2SN61ZWG1XTDBK47: 11 event(s)\n```\n\nReplay the timeline:\n\n```\npromptetheus replay 01KVMZ4T7V2SN61ZWG1XTDBK47\n```\n\nExample output:\n\n```\n[0] state_change name='session_started'\n[1] tool_call tool_name='run_agent'\n[2] user_message content='Book Tuesday at 2pm'\n[3] tool_call tool_name='search_calendar'\n[4] tool_result call_id='190a6438979141f5ac11b2e1b2ee29a0'\n[5] state_change name='span_start'\n[6] state_change name='span_end'\n[7] agent_message content='Booked Wednesday 2pm'\n[8] goal_check passed=False\n[9] tool_result call_id='a78566297e0a4a309d5ce44cefe0d836'\n[10] session_end status='completed'\n```\n\nReplay the run tree:\n\n```\npromptetheus replay 01KVMZ4T7V2SN61ZWG1XTDBK47 --tree\n```\n\nExample output:\n\n```\n[0] state_change name='session_started'\n[1] tool_call tool_name='run_agent'\n[2] user_message content='Book Tuesday at 2pm'\n[3] tool_call tool_name='search_calendar'\n[4] tool_result call_id='190a6438979141f5ac11b2e1b2ee29a0'\n[7] agent_message content='Booked Wednesday 2pm'\n[8] goal_check passed=False\n[9] tool_result call_id='a78566297e0a4a309d5ce44cefe0d836'\n[10] session_end status='completed'\nchoose-slot span=span_163a8380174647e98bfe1f3fff9e15b9 duration_ms=0.0\n```\n\nGenerate a failure fingerprint:\n\n```\npromptetheus fingerprint 01KVMZ4T7V2SN61ZWG1XTDBK47\n```\n\nExample output:\n\n```\n8ae0f41220d0  goal mismatch: selected wednesday, not tuesday\n  - goal:selected wednesday, not tuesday\n```\n\nInspect the local delivery spool:\n\n```\npromptetheus spool list\n```\n\nExample output:\n\n```\nSpool: .promptetheus/spool\n  pending : 11 event(s) across 1 session file(s), 4082 bytes\n  dead    : 0 event(s) across 0 file(s), 0 bytes\n    01KVMZ4T7V2SN61ZWG1XTDBK47: 11 pending\n```\n\nThe raw spool is JSONL. Each line is an event envelope:\n\n```\n{\n  \"type\": \"tool_call\",\n  \"session_id\": \"01KVMZ4T7V2SN61ZWG1XTDBK47\",\n  \"seq\": 1,\n  \"idempotency_key\": \"01KVMZ4T7V2SN61ZWG1XTDBK47:29c5eff0:1\",\n  \"payload\": {\n    \"tool_name\": \"run_agent\",\n    \"call_id\": \"a78566297e0a4a309d5ce44cefe0d836\",\n    \"arguments\": {\n      \"args\": \"('Book Tuesday at 2pm',)\",\n      \"kwargs\": \"{}\"\n    }\n  }\n}\n```\n\nUse `pt.trace.start(...)`\n\nwhen you control the run boundary and want explicit\nevent calls instead of decorators:\n\n``` python\nimport promptetheus as pt\n\nwith pt.trace.start(\n    agent=\"demo-agent\",\n    user_goal=\"Book a meeting for Tuesday\",\n    transport=\"auto\",\n) as session:\n    session.user_message(\"Please book the small room for Tuesday at 2pm\")\n    session.tool_call(\"calendar.search\", {\"day\": \"Tuesday\"}, call_id=\"calendar-1\")\n    session.tool_result(\"calendar-1\", result={\"available\": [\"2pm\", \"3pm\"]})\n    session.agent_message(\"Booking confirmed for Wednesday at 2pm\")\n    session.goal_check(False, mismatches=[\"booked Wednesday, not Tuesday\"])\n# session_end is emitted automatically; transport flush runs on exit\n```\n\nThe package exposes these primary entry points:\n\n``` python\nimport promptetheus as pt\n\npt.trace.start(...)\npt.start(...)\npt.observe(...)\npt.tool\npt.traced(...)\npt.current()\npt.Session\npt.AsyncSession\npt.AgentRuntime\n```\n\nCommon session helpers:\n\n```\nsession.user_message(\"Book Tuesday at 2pm Pacific\")\nsession.agent_message(\"I found availability\")\nsession.tool_call(\"browser.click\", {\"selector\": \"#checkout\"}, call_id=\"click-1\")\nsession.tool_result(\"click-1\", result={\"ok\": True})\nsession.retrieval(\"refund policy\", documents=[{\"id\": \"doc-1\", \"score\": 0.91}])\nsession.browser_action(\"click\", \"#checkout\", url=page.url)\nsession.dom_snapshot(page.url, visible_text, selected_values={\"day\": \"Tuesday\"})\nsession.screenshot(page.screenshot())\nsession.replay_artifact(\"trace.webm\", artifact_type=\"screen_recording\", event_time_map={})\nsession.llm_call(\"gpt-5\", input_tokens=100, output_tokens=40, latency_ms=900)\nsession.score(\"goal_match\", 0.2, comment=\"Selected the wrong day\")\nsession.metric(\"steps\", 12, unit=\"count\")\nsession.error(RuntimeError(\"calendar API timeout\"), handled=True)\nsession.goal_check(False, mismatches=[\"selected Wednesday\"])\nsession.end(\"failed\")\nsession.flush(timeout=2)\n```\n\nEvery helper writes a schema-valid event envelope with `type`\n\n, `session_id`\n\n,\n`timestamp`\n\n, `seq`\n\n, `idempotency_key`\n\n, and `payload`\n\n. Use `metadata`\n\nfor safe,\nlow-cardinality context. Do not put raw secrets, cookies, tokens, or\ncredentials into event payloads.\n\nUse `AsyncSession`\n\nwhen the top-level agent run is async:\n\n``` python\nfrom promptetheus import AsyncSession\n\nasync with AsyncSession(agent=\"voice-agent\", user_goal=\"Summarize the call\") as session:\n    session.user_message(\"Summarize this call\")\n    async with session.aspan(\"transcribe\"):\n        session.metric(\"audio_seconds\", 42, unit=\"seconds\")\n    session.goal_check(True)\n```\n\nBrowser agents should record the user goal, critical browser actions, the final DOM state, and an explicit goal check:\n\n```\nsession.browser_action(\"click\", \"#confirm\", url=page.url)\nsession.dom_snapshot(\n    page.url,\n    visible_text=await page.locator(\"body\").inner_text(),\n    selected_values={\"day\": \"Wednesday\", \"time\": \"2pm\"},\n    warnings=[\"Timezone changed from Pacific to Eastern\"],\n)\nsession.goal_check(\n    False,\n    mismatches=[\"booked Wednesday\", \"timezone warning visible\"],\n)\n```\n\nThis is the path that lets Promptetheus replay a failure and produce fix-agent evidence instead of just storing generic logs.\n\nAdapters are optional and imported lazily. Install only the extra you need:\n\n```\npip install \"promptetheus[openai]\"\npip install \"promptetheus[anthropic]\"\npip install \"promptetheus[langchain]\"\npip install \"promptetheus[playwright]\"\n```\n\nAvailable adapter exports:\n\n```\nfrom promptetheus.adapters import (\n    AnthropicAdapter,\n    AutoGenAdapter,\n    CrewAIAdapter,\n    DSPyAdapter,\n    HaystackAdapter,\n    LangGraphAdapter,\n    LiteLLMAdapter,\n    LlamaIndexAdapter,\n    OpenAIAdapter,\n    OpenTelemetryBridge,\n    PlaywrightAdapter,\n    PromptetheusCallbackHandler,\n    PydanticAIAdapter,\n)\n```\n\nUse adapters when a framework already emits structured callbacks. Keep custom instrumentation close to the real run boundary when the framework does not.\n\n`AgentRuntime`\n\nis a best-effort client for live, service-backed coordination.\nIt is separate from durable trace storage and never raises into host code when\nthe service is unavailable:\n\n``` python\nfrom promptetheus import AgentRuntime\n\nruntime = AgentRuntime(session.session_id)\nruntime.remember(\"hypothesis\", {\"summary\": \"auth header may be missing\"})\nhint = runtime.before_tool_call(\"pytest\", command=\"pytest tests/server\")\n\nresult = run_tests()\nruntime.after_tool_call(\n    \"pytest\",\n    command=\"pytest tests/server\",\n    status=\"failed\" if result.failed else \"succeeded\",\n    error=result.error,\n)\nruntime.heartbeat(phase=\"investigating\", current_file=\"tests/server/test_mcp.py\")\nnext_hint = runtime.next_hint()\n```\n\nIn a fresh install, local gateway and MCP commands need their extras:\n\n```\npip install \"promptetheus[server,mcp]\"\npromptetheus dev                     # boot local FastAPI ingestion on :4318\npromptetheus doctor                  # config, reachability, spool summary\npromptetheus spool list              # pending local delivery files\npromptetheus spool replay            # retry pending delivery through the API\npromptetheus sessions                # list locally spooled sessions\npromptetheus replay <session-id>     # print a flat timeline\npromptetheus replay <session-id> --tree\npromptetheus diff <baseline> <candidate>\npromptetheus fingerprint <session-id>\npromptetheus import exported-session.json\n```\n\n`spool purge`\n\ndeletes local spool files. Use it only when you are sure the data\nis no longer needed.\n\nGenerate hosted MCP client config without mutating global client files:\n\n```\npromptetheus mcp install \\\n  --client codex \\\n  --workspace acme \\\n  --project-ref abcdefghijklmnopqrst\n```\n\nSupported clients are `codex`\n\n, `claude`\n\n, and `cursor`\n\n. The generated config\nuses a stdio bridge to hosted Promptetheus MCP and defaults to read-only,\nproject-scoped Supabase evidence. SDK clients and MCP client config should not\nreceive Supabase service-role keys.\n\nFor local stdio development:\n\n```\npromptetheus mcp\n```\n\nThe SDK lives under `packages/promptetheus/promptetheus`\n\n. Tests live at the\nrepository root under `tests`\n\n.\n\nUseful commands:\n\n```\nuv run --project packages/promptetheus --extra dev pytest tests/sdk -q\nuv run --project packages/promptetheus --extra dev pytest tests/cli -q\nuv run --project packages/promptetheus --extra dev --extra server --extra mcp pytest tests/server/test_mcp.py -q\nuv run --project packages/promptetheus --extra dev mypy\n```\n\nDocs to read next:\n\n- Promptetheus project keys identify Promptetheus projects. They are not Supabase service-role keys.\n- The hosted service owns Supabase credentials and scopes evidence reads by workspace/project.\n- Use\n`redact=\"default\"`\n\nor a custom redactor for sensitive payloads. - Store prompt/message references instead of raw large or sensitive LLM payloads when possible.\n- The SDK should observe agents, not rewrite their architecture or hide failed goals.\n\n**Status:** Stable `2.0.1`\n\nSDK for hosted/self-hosted Promptetheus trace\ndelivery.", "url": "https://wpnews.pro/news/promptetheus-trace-detect-and-auto-repair-ai-agent-failures", "canonical_source": "https://github.com/obro79/promptetheus", "published_at": "2026-06-27 04:37:00+00:00", "updated_at": "2026-06-27 05:05:25.637429+00:00", "lang": "en", "topics": ["ai-agents", "developer-tools", "ai-tools"], "entities": ["Promptetheus", "Python", "MCP", "Supabase"], "alternates": {"html": "https://wpnews.pro/news/promptetheus-trace-detect-and-auto-repair-ai-agent-failures", "markdown": "https://wpnews.pro/news/promptetheus-trace-detect-and-auto-repair-ai-agent-failures.md", "text": "https://wpnews.pro/news/promptetheus-trace-detect-and-auto-repair-ai-agent-failures.txt", "jsonld": "https://wpnews.pro/news/promptetheus-trace-detect-and-auto-repair-ai-agent-failures.jsonld"}}