Before And After

wpnews.pro

cd /news/ai-agents/before-and-after · home › topics › ai-agents › article

[ARTICLE · art-20892] src=dev.to ↗ pub=2026-06-03T21:49Z topic=ai-agents verified=true sentiment=· neutral

Before And After

A developer building an AI agent for hospital bed management found that traditional unit testing fails for AI agents because their internal processes are non-deterministic. The engineer adopted a "news fact-checking" approach, testing only the final system state rather than the agent's decision-making process, verifying that database records are correct after the agent executes an operation.

read3 min views18 publishedJun 3, 2026

My AI Agent can finally write data — assign beds, update predictions, create alerts, place orders.

Code done, I got stuck on a question:

How do you test something like this?

Regular functions are easy to test. add(1, 1)

always returns 2

. Same every time. I can write assert add(1, 1) == 2

But Agents are different. Ask it "find me an empty bed":

Results are all correct, but the process differs.

You can't write:

assert agent("find an empty bed") == "some specific sentence"

Because it says something different every time.

Thought about it for a while. Found the answer somewhere unexpected: news fact-checking.

How do journalists verify a report's accuracy?

They don't verify "how the reporter gathered information" — how many calls they made, how many sites they visited, how many people they talked to. Process is too complex. Every reporter does it differently.

What they verify is results:

Doesn't matter how the reporter got the information. As long as the final reported facts are accurate, it passes.

AI Agents can be tested the same way.

The core framework is just three steps:

Step	What to Do	Analogy
Before	Check initial system state	What things looked like before the event
Action	Let Agent execute the operation	Reporter goes to investigate
After	Check final system state	Verify if the report is accurate

Example with "transfer bed" functionality:

before_bed = query("SELECT bed_id FROM admission WHERE patient_name='Zhang San'")
available_beds = query("SELECT * FROM bed WHERE room_type='postpartum' AND status='empty'")

agent("Transfer Zhang San to postpartum ward")

after_bed = query("SELECT bed_id FROM admission WHERE patient_name='Zhang San'")
assert after_bed in available_beds

The key here:

I don't care how the Agent did it. I only care whether the system state is correct after it's done.

Agent wants to query bed table first or room table first? Up to it. As long as Zhang San ends up in the right bed, it's correct.

This method has several advantages:

Agent's "process" might differ each time, but the "result" is deterministic.

No need to mock Agent's internal behavior. No intercepting API calls. No verifying generated SQL.

Just query database state directly: what it looked like before, what it looks like after.

If the Agent upgrades later (new model, changed prompt), as long as results are still correct, tests pass.

Internal implementation can change freely. Interface contract stays stable.

One detail worth mentioning: test fixtures.

Write operations change database state. Without isolation between tests, later tests get polluted by earlier ones.

Solution: restore database to initial state before each test:

@pytest.fixture(autouse=True)
def reset_database():
    restore_from_backup()
    yield

Like fact-checking: each time you verify a new piece of news, you start fresh, unaffected by previous checks.

Testing solves "verifying correctness during development."

But there's another problem: nobody runs tests in production.

Agent might execute hundreds of operations daily after going live. How do you review when something goes wrong?

Like news organizations needing to keep interview recordings, raw files, edit history — audit trails.

AI Agents need the same:

I researched this — the field is called Governance or Audit. The Strands framework has interfaces for writing logs to S3, DynamoDB, etc.

Didn't implement this time — current priority is getting Agent core working. But I noted this requirement. Must do before launch.

Knowing what needs to be done matters more than doing it right now.

How to test systems with uncertainty:

Don't test process, test results. Before → Action → After.

This framework applies beyond AI Agents. Any system that "changes world state" can be tested this way:

Simple frameworks are often the most useful.

source & further reading

dev.to — original article Teaching Agents to Slow Down Where It Matters Introducing Radar: An Open-Source, Self-Hosted AI Media Intelligence Platform Cross-Vendor Audit: What It Caught in My Own Model's Writing, and What It Got Wrong

~/api · this article 200

$curl api.wpnews.pro/v1/news/before-and-after

Read original on dev.to → dev.to/lingxin_wang_3b88f34b4014/before-and-afte…

mentioned entities

Zhang San

metadata

slugbefore-and-after

topic#ai-agents

secondary4 topics

sentimentneutral

canonicaldev.to

navigation

← prevThanks for joining us at Build, …

next →What the ChatGPT for Sheets data…

── more in #ai-agents 4 stories · sorted by recency

claude.com · 19 Jul · #ai-agents

Anthropic runs large-scale code migrations with Claude Code

news.ycombinator.com · 19 Jul · #ai-agents

Prodigy – AI Workforce for professional teams

dev.to · 19 Jul · #ai-agents

Introducing Radar: An Open-Source, Self-Hosted AI Media Intelligence Platform

interviewpracticeai.com · 19 Jul · #ai-agents

Show HN: AI mock interview tool that scores your answers – free, no signup

── more on @zhang san 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 18 Jul · #artificial-intelligence

Ada: An AI business intelligence software from CSV and Excel(yes LLMs but more)

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required