Promptetheus – Trace, detect, and auto-repair AI agent failures

Promptetheus, a new debugging infrastructure for AI agents, launched with a Python SDK, local replay tooling, and hosted trace delivery. The tool enables developers to trace, detect, and auto-repair agent failures through decorators, typed events, and durable delivery that never crashes the host agent. Promptetheus aims to simplify debugging for coding agents by providing local CLI tools and hosted MCP evidence access.

Promptetheus is debugging infrastructure for AI agents: a Python SDK, local replay tooling, hosted trace delivery, and MCP evidence access for coding agents that need to fix failing agent runs. - One trace per user-visible agent task. - Decorators for top-level agent runs, tool calls, and nested spans. - Typed events for user messages, agent messages, tool calls, browser actions, DOM snapshots, screenshots, LLM calls, retrieval, metrics, errors, scores, and final goal checks. - Durable delivery that never crashes the host agent. If HTTP delivery is not configured or fails, events spool locally and can be replayed later. - Local CLI tools for doctor checks, spool inspection, session replay, diffing, and failure fingerprints. - Hosted MCP config snippets for read-only incident evidence scoped to a workspace and Supabase project. For a normal project, install from PyPI: pip install promptetheus promptetheus version Create or configure a hosted project key: export PROMPTETHEUS CONSOLE TOKEN=... promptetheus init \ --workspace-name "Acme" \ --project-name "Browser Agent" \ --write-env .env source .env promptetheus doctor For local self-hosted development: promptetheus init \ --api-url http://127.0.0.1:4318 \ --console-token pt console token \ --write-env .env source .env For contributor work from this repository: pip install -e packages/promptetheus promptetheus version With transport="auto" , the SDK sends to the configured API when PROMPTETHEUS API KEY is present. Without a key, it writes to the local spool so the instrumented agent keeps running. Use decorators when you want instrumentation to sit directly on agent and tool functions: php import promptetheus as pt @pt.tool def search calendar day: str - list str : return "Tuesday 2pm", "Tuesday 3pm" @pt.traced "choose-slot" def choose slot slots: list str - str: return "Wednesday 2pm" @pt.observe agent="calendar-agent", user goal="Book Tuesday at 2pm", transport="auto", use "spool" to force local JSONL while trying this def run agent goal: str - str: pt.current .user message goal slots = search calendar "Tuesday" selected = choose slot slots pt.current .agent message f"Booked {selected}" pt.current .goal check False, mismatches= "selected Wednesday, not Tuesday" , return selected run agent "Book Tuesday at 2pm" What each decorator does: @pt.observe ... starts one trace/session around the top-level run. @pt.tool records tool call and tool result events inside the current session. @pt.traced "name" adds a nested span to the replay tree without starting a separate session. pt.current returns the active session so the agent can record user messages, agent messages, goal checks, errors, metrics, and other events. goal check False is visible in replay, fingerprints, and tail sampling. If a failed goal should also make the process fail, record the goal check and then raise an exception so the terminal session end status is failed : if not selected.startswith "Tuesday" : pt.current .goal check False, mismatches= "selected Wednesday" raise RuntimeError "agent selected the wrong day" When no API key is configured, transport="auto" writes local JSONL. While learning, you can also pass transport="spool" to force local output. After a local or spooled run, list sessions: promptetheus sessions Example output: 01KVMZ4T7V2SN61ZWG1XTDBK47: 11 event s Replay the timeline: promptetheus replay 01KVMZ4T7V2SN61ZWG1XTDBK47 Example output: 0 state change name='session started' 1 tool call tool name='run agent' 2 user message content='Book Tuesday at 2pm' 3 tool call tool name='search calendar' 4 tool result call id='190a6438979141f5ac11b2e1b2ee29a0' 5 state change name='span start' 6 state change name='span end' 7 agent message content='Booked Wednesday 2pm' 8 goal check passed=False 9 tool result call id='a78566297e0a4a309d5ce44cefe0d836' 10 session end status='completed' Replay the run tree: promptetheus replay 01KVMZ4T7V2SN61ZWG1XTDBK47 --tree Example output: 0 state change name='session started' 1 tool call tool name='run agent' 2 user message content='Book Tuesday at 2pm' 3 tool call tool name='search calendar' 4 tool result call id='190a6438979141f5ac11b2e1b2ee29a0' 7 agent message content='Booked Wednesday 2pm' 8 goal check passed=False 9 tool result call id='a78566297e0a4a309d5ce44cefe0d836' 10 session end status='completed' choose-slot span=span 163a8380174647e98bfe1f3fff9e15b9 duration ms=0.0 Generate a failure fingerprint: promptetheus fingerprint 01KVMZ4T7V2SN61ZWG1XTDBK47 Example output: 8ae0f41220d0 goal mismatch: selected wednesday, not tuesday - goal:selected wednesday, not tuesday Inspect the local delivery spool: promptetheus spool list Example output: Spool: .promptetheus/spool pending : 11 event s across 1 session file s , 4082 bytes dead : 0 event s across 0 file s , 0 bytes 01KVMZ4T7V2SN61ZWG1XTDBK47: 11 pending The raw spool is JSONL. Each line is an event envelope: { "type": "tool call", "session id": "01KVMZ4T7V2SN61ZWG1XTDBK47", "seq": 1, "idempotency key": "01KVMZ4T7V2SN61ZWG1XTDBK47:29c5eff0:1", "payload": { "tool name": "run agent", "call id": "a78566297e0a4a309d5ce44cefe0d836", "arguments": { "args": " 'Book Tuesday at 2pm', ", "kwargs": "{}" } } } Use pt.trace.start ... when you control the run boundary and want explicit event calls instead of decorators: python import promptetheus as pt with pt.trace.start agent="demo-agent", user goal="Book a meeting for Tuesday", transport="auto", as session: session.user message "Please book the small room for Tuesday at 2pm" session.tool call "calendar.search", {"day": "Tuesday"}, call id="calendar-1" session.tool result "calendar-1", result={"available": "2pm", "3pm" } session.agent message "Booking confirmed for Wednesday at 2pm" session.goal check False, mismatches= "booked Wednesday, not Tuesday" session end is emitted automatically; transport flush runs on exit The package exposes these primary entry points: python import promptetheus as pt pt.trace.start ... pt.start ... pt.observe ... pt.tool pt.traced ... pt.current pt.Session pt.AsyncSession pt.AgentRuntime Common session helpers: session.user message "Book Tuesday at 2pm Pacific" session.agent message "I found availability" session.tool call "browser.click", {"selector": " checkout"}, call id="click-1" session.tool result "click-1", result={"ok": True} session.retrieval "refund policy", documents= {"id": "doc-1", "score": 0.91} session.browser action "click", " checkout", url=page.url session.dom snapshot page.url, visible text, selected values={"day": "Tuesday"} session.screenshot page.screenshot session.replay artifact "trace.webm", artifact type="screen recording", event time map={} session.llm call "gpt-5", input tokens=100, output tokens=40, latency ms=900 session.score "goal match", 0.2, comment="Selected the wrong day" session.metric "steps", 12, unit="count" session.error RuntimeError "calendar API timeout" , handled=True session.goal check False, mismatches= "selected Wednesday" session.end "failed" session.flush timeout=2 Every helper writes a schema-valid event envelope with type , session id , timestamp , seq , idempotency key , and payload . Use metadata for safe, low-cardinality context. Do not put raw secrets, cookies, tokens, or credentials into event payloads. Use AsyncSession when the top-level agent run is async: python from promptetheus import AsyncSession async with AsyncSession agent="voice-agent", user goal="Summarize the call" as session: session.user message "Summarize this call" async with session.aspan "transcribe" : session.metric "audio seconds", 42, unit="seconds" session.goal check True Browser agents should record the user goal, critical browser actions, the final DOM state, and an explicit goal check: session.browser action "click", " confirm", url=page.url session.dom snapshot page.url, visible text=await page.locator "body" .inner text , selected values={"day": "Wednesday", "time": "2pm"}, warnings= "Timezone changed from Pacific to Eastern" , session.goal check False, mismatches= "booked Wednesday", "timezone warning visible" , This is the path that lets Promptetheus replay a failure and produce fix-agent evidence instead of just storing generic logs. Adapters are optional and imported lazily. Install only the extra you need: pip install "promptetheus openai " pip install "promptetheus anthropic " pip install "promptetheus langchain " pip install "promptetheus playwright " Available adapter exports: from promptetheus.adapters import AnthropicAdapter, AutoGenAdapter, CrewAIAdapter, DSPyAdapter, HaystackAdapter, LangGraphAdapter, LiteLLMAdapter, LlamaIndexAdapter, OpenAIAdapter, OpenTelemetryBridge, PlaywrightAdapter, PromptetheusCallbackHandler, PydanticAIAdapter, Use adapters when a framework already emits structured callbacks. Keep custom instrumentation close to the real run boundary when the framework does not. AgentRuntime is a best-effort client for live, service-backed coordination. It is separate from durable trace storage and never raises into host code when the service is unavailable: python from promptetheus import AgentRuntime runtime = AgentRuntime session.session id runtime.remember "hypothesis", {"summary": "auth header may be missing"} hint = runtime.before tool call "pytest", command="pytest tests/server" result = run tests runtime.after tool call "pytest", command="pytest tests/server", status="failed" if result.failed else "succeeded", error=result.error, runtime.heartbeat phase="investigating", current file="tests/server/test mcp.py" next hint = runtime.next hint In a fresh install, local gateway and MCP commands need their extras: pip install "promptetheus server,mcp " promptetheus dev boot local FastAPI ingestion on :4318 promptetheus doctor config, reachability, spool summary promptetheus spool list pending local delivery files promptetheus spool replay retry pending delivery through the API promptetheus sessions list locally spooled sessions promptetheus replay <session-id print a flat timeline promptetheus replay <session-id --tree promptetheus diff <baseline <candidate promptetheus fingerprint <session-id promptetheus import exported-session.json spool purge deletes local spool files. Use it only when you are sure the data is no longer needed. Generate hosted MCP client config without mutating global client files: promptetheus mcp install \ --client codex \ --workspace acme \ --project-ref abcdefghijklmnopqrst Supported clients are codex , claude , and cursor . The generated config uses a stdio bridge to hosted Promptetheus MCP and defaults to read-only, project-scoped Supabase evidence. SDK clients and MCP client config should not receive Supabase service-role keys. For local stdio development: promptetheus mcp The SDK lives under packages/promptetheus/promptetheus . Tests live at the repository root under tests . Useful commands: uv run --project packages/promptetheus --extra dev pytest tests/sdk -q uv run --project packages/promptetheus --extra dev pytest tests/cli -q uv run --project packages/promptetheus --extra dev --extra server --extra mcp pytest tests/server/test mcp.py -q uv run --project packages/promptetheus --extra dev mypy Docs to read next: - Promptetheus project keys identify Promptetheus projects. They are not Supabase service-role keys. - The hosted service owns Supabase credentials and scopes evidence reads by workspace/project. - Use redact="default" or a custom redactor for sensitive payloads. - Store prompt/message references instead of raw large or sensitive LLM payloads when possible. - The SDK should observe agents, not rewrite their architecture or hide failed goals. Status: Stable 2.0.1 SDK for hosted/self-hosted Promptetheus trace delivery.