{"slug": "i-built-an-open-source-sdk-to-catch-ai-agent-regressions-before-they-ship", "title": "I built an open source SDK to catch AI agent regressions before they ship.", "summary": "An open-source SDK called replayd has been released to catch AI agent regressions before deployment. The tool, built by developer Taimoor Khan, captures failed agent runs and replays them against new versions to detect recurring failures. Instead of exact output matching, replayd uses deterministic assertions for structural failures and an LLM judge for semantic ones, checking what the agent did rather than what it said.", "body_md": "I built an open source SDK to catch AI agent regressions before they ship\n\nYou fix a bug in your agent. A week later you change the prompt or swap the model. The same bug comes back. Nobody notices until a user does.\n\nRegular software has regression tests for this. AI agents mostly do not. So I built replayd.\n\nWhen your agent fails, you capture that run and save it as a test. Before you ship a new version, you replay the saved failures against it. If the same failure comes back, you catch it before your users do.\n\n```\npip install replayd\n```\n\nThe interesting part was grading. You cannot use exact output matching because LLMs are non deterministic. So replayd does not check the text. It checks whether the specific failure came back. Structural failures get deterministic assertions. Semantic ones get an LLM as judge. You assert on what the agent did, not what it said.\n\nIt is v0.1.1, early, rough edges, but the core loop works. Zero runtime dependencies in the core. Framework agnostic.\n\nGitHub: github.com/TaimoorKhan10/replayd\n\nIf you are running agents in production I would love your feedback on the grading approach. What are you catching manually right now that you wish was automated?", "url": "https://wpnews.pro/news/i-built-an-open-source-sdk-to-catch-ai-agent-regressions-before-they-ship", "canonical_source": "https://dev.to/taimoorkhan10/i-built-an-open-source-sdk-to-catch-ai-agent-regressions-before-they-ship-40ba", "published_at": "2026-05-30 21:47:34+00:00", "updated_at": "2026-05-30 22:41:06.242039+00:00", "lang": "en", "topics": ["ai-agents", "ai-tools", "mlops", "ai-safety", "ai-products"], "entities": ["replayd", "TaimoorKhan10", "GitHub", "LLM"], "alternates": {"html": "https://wpnews.pro/news/i-built-an-open-source-sdk-to-catch-ai-agent-regressions-before-they-ship", "markdown": "https://wpnews.pro/news/i-built-an-open-source-sdk-to-catch-ai-agent-regressions-before-they-ship.md", "text": "https://wpnews.pro/news/i-built-an-open-source-sdk-to-catch-ai-agent-regressions-before-they-ship.txt", "jsonld": "https://wpnews.pro/news/i-built-an-open-source-sdk-to-catch-ai-agent-regressions-before-they-ship.jsonld"}}