{"slug": "can-an-ai-agent-pass-the-test-we-give-4-year-olds", "title": "Can an AI Agent Pass the Test We Give 4-Year-Olds?", "summary": "Shridhar Shah, a senior software engineer at Outshift by Cisco, built two AI agents to test theory of mind using the Sally-Anne false-belief test. One agent, which only tracks reality, fails the test like a toddler, while the other, which models each person's beliefs separately, passes. The project demonstrates how tracking beliefs distinct from reality is key for collaborative AI agents.", "body_md": "*Theory of Mind and the Sally-Anne false-belief test, in ~60 lines of Python.*\n\n**TL;DR:** There's a famous test that kids pass around age 4. It checks whether you understand that *other people can believe things that aren't true.* I built two AI agents: one that only knows \"what's actually happening\" (fails, like a toddler) and one that keeps track of what *each person* believes (passes). It's ~110 lines, and it's the foundation for agents that can actually work *together*.\n\nIf you said *basket*, nice — you just used something called \"theory of mind.\" Sally never saw the marble move, so in her head it's still in the basket. What's *actually* true (it's in the box) and what *Sally believes* (it's in the basket) are two different things, and you kept them separate without even thinking about it.\n\nA 3-year-old says \"box\" — they can't yet separate what *they* know from what *Sally* knows. A 4-year-old says \"basket.\" It's one of the most famous tests in child psychology, and in 2026 it's become a real test for AI agents too.\n\n| ❌ Agent with no \"theory of mind\" | ✅ Agent that models other minds | |\n|---|---|---|\n| What it tracks | only what's actually true | what each person believes, separately |\n| Where will Sally look? | \"box\" | \"basket\" |\n| Result | FAIL (only knows reality) | PASS |\n\nThe only difference between the two agents is one rule: **a person's belief only updates when that person is actually in the room to see it happen.**\n\n``` python\ndef someone_moves_the_marble(new_place, who_is_watching):\n    for person in who_is_watching:        # only people in the room\n        beliefs[person] = new_place        # update THEIR mental picture\n```\n\nSo when Anne moves the marble while Sally is out, only Anne's mental picture updates. Sally's is frozen at \"basket.\" Ask the simple agent and it just reports reality (\"box\"). Ask the smarter agent and it answers from *Sally's* point of view (\"basket\").\n\nThat's the whole thing. But keeping a separate picture of \"what does each *other* person know\" is the difference between an agent that's a good teammate and one that isn't.\n\nAlmost everything useful about multiple agents (or an agent working with a human) needs this:\n\nMost AI today reasons about *the world*. The 2026 shift is reasoning about *the people in the world* — including when they're wrong. That's what turns a smart tool into a real collaborator.\n\nBeing smart about the world makes a good tool. Being smart about\n\nother peoplemakes a good teammate.\n\n```\ngit clone https://github.com/Shridhar-2205/living-software\ncd living-software/03-theory-of-mind\npython demo.py\n```\n\nHonest note: real versions have to *figure out* what someone believes by watching their behavior, which is much harder. Here I just tell the agent who was in the room, so the core idea — track beliefs separately from reality — is as clear as possible.\n\n*Written by **Shridhar Shah**, Senior Software Engineer at Outshift by Cisco — AI agents, search, and how they \"think.\" Part 3 of \"Toward Living Software.\" GitHub · LinkedIn*\n\nBackground:the Sally-Anne false-belief test (Baron-Cohen, Leslie & Frith, 1985); Kosinski, \"Evaluating Large Language Models in Theory of Mind Tasks\" (PNAS 2024 /[arXiv:2302.02083]); and a 2026 follow-up showing how brittle this still is — \"Understanding Artificial Theory of Mind\" ([arXiv:2602.22072]).", "url": "https://wpnews.pro/news/can-an-ai-agent-pass-the-test-we-give-4-year-olds", "canonical_source": "https://dev.to/shridhar_shah2297/can-an-ai-agent-pass-the-test-we-give-4-year-olds-5825", "published_at": "2026-06-27 21:43:33+00:00", "updated_at": "2026-06-27 22:03:54.953792+00:00", "lang": "en", "topics": ["artificial-intelligence", "ai-agents", "ai-research", "machine-learning", "large-language-models"], "entities": ["Shridhar Shah", "Outshift by Cisco", "Cisco", "Sally-Anne false-belief test", "Baron-Cohen", "Leslie", "Frith", "Kosinski"], "alternates": {"html": "https://wpnews.pro/news/can-an-ai-agent-pass-the-test-we-give-4-year-olds", "markdown": "https://wpnews.pro/news/can-an-ai-agent-pass-the-test-we-give-4-year-olds.md", "text": "https://wpnews.pro/news/can-an-ai-agent-pass-the-test-we-give-4-year-olds.txt", "jsonld": "https://wpnews.pro/news/can-an-ai-agent-pass-the-test-we-give-4-year-olds.jsonld"}}