cd /news/ai-agents/i-built-an-open-source-sdk-to-catch-… · home topics ai-agents article
[ARTICLE · art-18807] src=dev.to pub= topic=ai-agents verified=true sentiment=↑ positive

I built an open source SDK to catch AI agent regressions before they ship.

An open-source SDK called replayd has been released to catch AI agent regressions before deployment. The tool, built by developer Taimoor Khan, captures failed agent runs and replays them against new versions to detect recurring failures. Instead of exact output matching, replayd uses deterministic assertions for structural failures and an LLM judge for semantic ones, checking what the agent did rather than what it said.

read1 min publishedMay 30, 2026

I built an open source SDK to catch AI agent regressions before they ship

You fix a bug in your agent. A week later you change the prompt or swap the model. The same bug comes back. Nobody notices until a user does.

Regular software has regression tests for this. AI agents mostly do not. So I built replayd.

When your agent fails, you capture that run and save it as a test. Before you ship a new version, you replay the saved failures against it. If the same failure comes back, you catch it before your users do.

pip install replayd

The interesting part was grading. You cannot use exact output matching because LLMs are non deterministic. So replayd does not check the text. It checks whether the specific failure came back. Structural failures get deterministic assertions. Semantic ones get an LLM as judge. You assert on what the agent did, not what it said.

It is v0.1.1, early, rough edges, but the core loop works. Zero runtime dependencies in the core. Framework agnostic.

GitHub: github.com/TaimoorKhan10/replayd

If you are running agents in production I would love your feedback on the grading approach. What are you catching manually right now that you wish was automated?

── more in #ai-agents 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/i-built-an-open-sour…] indexed:0 read:1min 2026-05-30 ·