Pimmur, can LLM simulate human collective behavior?

Researchers auditing 39 studies on LLM-based human behavior simulations found that 89.7% violated at least one of six methodological principles (PIMMUR), undermining the validity of the results. The audit revealed that frontier LLMs correctly identified the underlying social experiment in 50.8% of cases and that 61% of prompts exerted excessive control, pre-determining outcomes. When the principles were enforced, reported collective phenomena often vanished or reversed, suggesting many "emergent" behaviors are methodological artifacts rather than genuine social dynamics.

Computer Science Computation and Language Submitted on 22 Sep 2025 v1 https://arxiv.org/abs/2509.18052v1 , last revised 6 Apr 2026 this version, v3 Title:The PIMMUR Principles: Ensuring Validity in Collective Behavior of LLM Societies View PDF /pdf/2509.18052 HTML experimental https://arxiv.org/html/2509.18052v3 Abstract:Large language models LLMs are increasingly deployed to simulate human collective behaviors, yet the methodological rigor of these "AI societies" remains under-explored. Through a systematic audit of 39 recent studies, we identify six pervasive flaws-spanning agent profiles, interaction, memory, control, unawareness, and realism PIMMUR . Our analysis reveals that 89.7% of studies violate at least one principle, undermining simulation validity. We demonstrate that frontier LLMs correctly identify the underlying social experiment in 50.8% of cases, while 61.0% of prompts exert excessive control that pre-determines outcomes. By reproducing five representative experiments e.g., telephone game , we show that reported collective phenomena often vanish or reverse when PIMMUR principles are enforced, suggesting that many "emergent" behaviors are methodological artifacts rather than genuine social dynamics. Our findings suggest that current AI simulations may capture model-specific biases rather than universal human social behaviors, raising critical concerns about the use of LLMs as scientific proxies for human society. Submission history From: Jen-Tse Huang view email /show-email/b8928242/2509.18052 Mon, 22 Sep 2025 17:27:29 UTC 1,771 KB v1 /abs/2509.18052v1 Mon, 26 Jan 2026 14:24:09 UTC 1,963 KB v2 /abs/2509.18052v2 v3 Mon, 6 Apr 2026 17:12:41 UTC 1,966 KB References & Citations Loading... Bibliographic and Citation Tools Bibliographic Explorer What is the Explorer? https://info.arxiv.org/labs/showcase.html arxiv-bibliographic-explorer Connected Papers What is Connected Papers? https://www.connectedpapers.com/about Litmaps What is Litmaps? https://www.litmaps.co/ scite Smart Citations What are Smart Citations? https://www.scite.ai/ Code, Data and Media Associated with this Article alphaXiv What is alphaXiv? https://alphaxiv.org/ CatalyzeX Code Finder for Papers What is CatalyzeX? https://www.catalyzex.com DagsHub What is DagsHub? https://dagshub.com/ Gotit.pub What is GotitPub? http://gotit.pub/faq Hugging Face What is Huggingface? https://huggingface.co/huggingface ScienceCast What is ScienceCast? https://sciencecast.org/welcome Demos Recommenders and Search Tools Influence Flower What are Influence Flowers? https://influencemap.cmlab.dev/ CORE Recommender What is CORE? https://core.ac.uk/services/recommender arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website. Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them. Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs https://info.arxiv.org/labs/index.html .