Theory of Mind and the Sally-Anne false-belief test, in ~60 lines of Python.
TL;DR: There's a famous test that kids pass around age 4. It checks whether you understand that other people can believe things that aren't true. I built two AI agents: one that only knows "what's actually happening" (fails, like a toddler) and one that keeps track of what each person believes (passes). It's ~110 lines, and it's the foundation for agents that can actually work together.
If you said basket, nice β you just used something called "theory of mind." Sally never saw the marble move, so in her head it's still in the basket. What's actually true (it's in the box) and what Sally believes (it's in the basket) are two different things, and you kept them separate without even thinking about it.
A 3-year-old says "box" β they can't yet separate what they know from what Sally knows. A 4-year-old says "basket." It's one of the most famous tests in child psychology, and in 2026 it's become a real test for AI agents too.
| β Agent with no "theory of mind" | β Agent that models other minds | |
|---|---|---|
| What it tracks | only what's actually true | what each person believes, separately |
| Where will Sally look? | "box" | "basket" |
| Result | FAIL (only knows reality) | PASS |
The only difference between the two agents is one rule: a person's belief only updates when that person is actually in the room to see it happen.
def someone_moves_the_marble(new_place, who_is_watching):
for person in who_is_watching: # only people in the room
beliefs[person] = new_place # update THEIR mental picture
So when Anne moves the marble while Sally is out, only Anne's mental picture updates. Sally's is frozen at "basket." Ask the simple agent and it just reports reality ("box"). Ask the smarter agent and it answers from Sally's point of view ("basket").
That's the whole thing. But keeping a separate picture of "what does each other person know" is the difference between an agent that's a good teammate and one that isn't.
Almost everything useful about multiple agents (or an agent working with a human) needs this:
Most AI today reasons about the world. The 2026 shift is reasoning about the people in the world β including when they're wrong. That's what turns a smart tool into a real collaborator.
Being smart about the world makes a good tool. Being smart about
other peoplemakes a good teammate.
git clone https://github.com/Shridhar-2205/living-software
cd living-software/03-theory-of-mind
python demo.py
Honest note: real versions have to figure out what someone believes by watching their behavior, which is much harder. Here I just tell the agent who was in the room, so the core idea β track beliefs separately from reality β is as clear as possible.
Written by Shridhar Shah, Senior Software Engineer at Outshift by Cisco β AI agents, search, and how they "think." Part 3 of "Toward Living Software." GitHub Β· LinkedIn
Background:the Sally-Anne false-belief test (Baron-Cohen, Leslie & Frith, 1985); Kosinski, "Evaluating Large Language Models in Theory of Mind Tasks" (PNAS 2024 /[arXiv:2302.02083]); and a 2026 follow-up showing how brittle this still is β "Understanding Artificial Theory of Mind" ([arXiv:2602.22072]).