{"slug": "before-and-after", "title": "Before And After", "summary": "A developer building an AI agent for hospital bed management found that traditional unit testing fails for AI agents because their internal processes are non-deterministic. The engineer adopted a \"news fact-checking\" approach, testing only the final system state rather than the agent's decision-making process, verifying that database records are correct after the agent executes an operation.", "body_md": "My AI Agent can finally write data — assign beds, update predictions, create alerts, place orders.\n\nCode done, I got stuck on a question:\n\n**How do you test something like this?**\n\nRegular functions are easy to test. `add(1, 1)`\n\nalways returns `2`\n\n. Same every time. I can write `assert add(1, 1) == 2`\n\n.\n\nBut Agents are different. Ask it \"find me an empty bed\":\n\nResults are all correct, but the process differs.\n\nYou can't write:\n\n```\nassert agent(\"find an empty bed\") == \"some specific sentence\"\n```\n\nBecause it says something different every time.\n\nThought about it for a while. Found the answer somewhere unexpected: **news fact-checking.**\n\nHow do journalists verify a report's accuracy?\n\nThey don't verify \"how the reporter gathered information\" — how many calls they made, how many sites they visited, how many people they talked to. Process is too complex. Every reporter does it differently.\n\nWhat they verify is **results**:\n\n**Doesn't matter how the reporter got the information. As long as the final reported facts are accurate, it passes.**\n\nAI Agents can be tested the same way.\n\nThe core framework is just three steps:\n\n| Step | What to Do | Analogy |\n|---|---|---|\n| Before | Check initial system state | What things looked like before the event |\n| Action | Let Agent execute the operation | Reporter goes to investigate |\n| After | Check final system state | Verify if the report is accurate |\n\nExample with \"transfer bed\" functionality:\n\n```\n# Before: Where is Zhang San now? Are there empty beds in postpartum?\nbefore_bed = query(\"SELECT bed_id FROM admission WHERE patient_name='Zhang San'\")\navailable_beds = query(\"SELECT * FROM bed WHERE room_type='postpartum' AND status='empty'\")\n\n# Action: Let Agent do the transfer\nagent(\"Transfer Zhang San to postpartum ward\")\n\n# After: Did Zhang San actually end up in postpartum?\nafter_bed = query(\"SELECT bed_id FROM admission WHERE patient_name='Zhang San'\")\nassert after_bed in available_beds\n```\n\nThe key here:\n\n**I don't care how the Agent did it. I only care whether the system state is correct after it's done.**\n\nAgent wants to query bed table first or room table first? Up to it. As long as Zhang San ends up in the right bed, it's correct.\n\nThis method has several advantages:\n\nAgent's \"process\" might differ each time, but the \"result\" is deterministic.\n\nNo need to mock Agent's internal behavior. No intercepting API calls. No verifying generated SQL.\n\nJust query database state directly: what it looked like before, what it looks like after.\n\nIf the Agent upgrades later (new model, changed prompt), as long as results are still correct, tests pass.\n\nInternal implementation can change freely. Interface contract stays stable.\n\nOne detail worth mentioning: **test fixtures.**\n\nWrite operations change database state. Without isolation between tests, later tests get polluted by earlier ones.\n\nSolution: restore database to initial state before each test:\n\n``` python\n@pytest.fixture(autouse=True)\ndef reset_database():\n    # Before each test\n    restore_from_backup()\n    yield\n    # After each test (optional cleanup)\n```\n\nLike fact-checking: each time you verify a new piece of news, you start fresh, unaffected by previous checks.\n\nTesting solves \"verifying correctness during development.\"\n\nBut there's another problem: **nobody runs tests in production.**\n\nAgent might execute hundreds of operations daily after going live. How do you review when something goes wrong?\n\nLike news organizations needing to keep interview recordings, raw files, edit history — **audit trails.**\n\nAI Agents need the same:\n\nI researched this — the field is called Governance or Audit. The Strands framework has interfaces for writing logs to S3, DynamoDB, etc.\n\nDidn't implement this time — current priority is getting Agent core working. But I noted this requirement. Must do before launch.\n\n**Knowing what needs to be done matters more than doing it right now.**\n\nHow to test systems with uncertainty:\n\nDon't test process, test results. Before → Action → After.\n\nThis framework applies beyond AI Agents. Any system that \"changes world state\" can be tested this way:\n\nSimple frameworks are often the most useful.", "url": "https://wpnews.pro/news/before-and-after", "canonical_source": "https://dev.to/lingxin_wang_3b88f34b4014/before-and-after-341f", "published_at": "2026-06-03 21:49:36+00:00", "updated_at": "2026-06-03 22:43:54.803531+00:00", "lang": "en", "topics": ["ai-agents", "artificial-intelligence", "ai-products", "ai-tools", "ai-research"], "entities": ["Zhang San"], "alternates": {"html": "https://wpnews.pro/news/before-and-after", "markdown": "https://wpnews.pro/news/before-and-after.md", "text": "https://wpnews.pro/news/before-and-after.txt", "jsonld": "https://wpnews.pro/news/before-and-after.jsonld"}}