cd /news/ai-agents/before-and-after Β· home β€Ί topics β€Ί ai-agents β€Ί article
[ARTICLE Β· art-20892] src=dev.to pub= topic=ai-agents verified=true sentiment=Β· neutral

Before And After

A developer building an AI agent for hospital bed management found that traditional unit testing fails for AI agents because their internal processes are non-deterministic. The engineer adopted a "news fact-checking" approach, testing only the final system state rather than the agent's decision-making process, verifying that database records are correct after the agent executes an operation.

read3 min publishedJun 3, 2026

My AI Agent can finally write data β€” assign beds, update predictions, create alerts, place orders.

Code done, I got stuck on a question:

How do you test something like this?

Regular functions are easy to test. add(1, 1)

always returns 2

. Same every time. I can write assert add(1, 1) == 2

.

But Agents are different. Ask it "find me an empty bed":

Results are all correct, but the process differs.

You can't write:

assert agent("find an empty bed") == "some specific sentence"

Because it says something different every time.

Thought about it for a while. Found the answer somewhere unexpected: news fact-checking.

How do journalists verify a report's accuracy?

They don't verify "how the reporter gathered information" β€” how many calls they made, how many sites they visited, how many people they talked to. Process is too complex. Every reporter does it differently.

What they verify is results:

Doesn't matter how the reporter got the information. As long as the final reported facts are accurate, it passes.

AI Agents can be tested the same way.

The core framework is just three steps:

Step What to Do Analogy
Before Check initial system state What things looked like before the event
Action Let Agent execute the operation Reporter goes to investigate
After Check final system state Verify if the report is accurate

Example with "transfer bed" functionality:

before_bed = query("SELECT bed_id FROM admission WHERE patient_name='Zhang San'")
available_beds = query("SELECT * FROM bed WHERE room_type='postpartum' AND status='empty'")

agent("Transfer Zhang San to postpartum ward")

after_bed = query("SELECT bed_id FROM admission WHERE patient_name='Zhang San'")
assert after_bed in available_beds

The key here:

I don't care how the Agent did it. I only care whether the system state is correct after it's done.

Agent wants to query bed table first or room table first? Up to it. As long as Zhang San ends up in the right bed, it's correct.

This method has several advantages:

Agent's "process" might differ each time, but the "result" is deterministic.

No need to mock Agent's internal behavior. No intercepting API calls. No verifying generated SQL.

Just query database state directly: what it looked like before, what it looks like after.

If the Agent upgrades later (new model, changed prompt), as long as results are still correct, tests pass.

Internal implementation can change freely. Interface contract stays stable.

One detail worth mentioning: test fixtures.

Write operations change database state. Without isolation between tests, later tests get polluted by earlier ones.

Solution: restore database to initial state before each test:

@pytest.fixture(autouse=True)
def reset_database():
    restore_from_backup()
    yield

Like fact-checking: each time you verify a new piece of news, you start fresh, unaffected by previous checks.

Testing solves "verifying correctness during development."

But there's another problem: nobody runs tests in production.

Agent might execute hundreds of operations daily after going live. How do you review when something goes wrong?

Like news organizations needing to keep interview recordings, raw files, edit history β€” audit trails.

AI Agents need the same:

I researched this β€” the field is called Governance or Audit. The Strands framework has interfaces for writing logs to S3, DynamoDB, etc.

Didn't implement this time β€” current priority is getting Agent core working. But I noted this requirement. Must do before launch.

Knowing what needs to be done matters more than doing it right now.

How to test systems with uncertainty:

Don't test process, test results. Before β†’ Action β†’ After.

This framework applies beyond AI Agents. Any system that "changes world state" can be tested this way:

Simple frameworks are often the most useful.

── more in #ai-agents 4 stories Β· sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain β€” perfect for shipping the agent you just read about.

$git push zahid main
β†’ Live at https://your-agent.zahid.host βœ“
Get free account β†’ Pricing
from €0/mo Β· no card required
LIVE [news/before-and-after] indexed:0 read:3min 2026-06-03 Β· β€”