cd /news/large-language-models/when-plausible-is-not-realistic-eval… · home topics large-language-models article
[ARTICLE · art-27533] src=arxiv.org ↗ pub= topic=large-language-models verified=true sentiment=· neutral

When Plausible Is Not Realistic: Evaluating Human Mobility in LLM-Based Urban Simulation

Researchers from multiple institutions introduced a validation framework to evaluate whether LLM-based urban simulators reproduce realistic human mobility patterns. Testing AgentSociety and CitySim against real-world data from Paris and Shanghai revealed a significant gap between narrative plausibility and empirical mobility realism, particularly in spatial and temporal constraints. The findings underscore the need for rigorous empirical validation and provide open-source tools for more realistic urban simulation.

read1 min publishedJun 15, 2026

arXiv:2606.13835v1 Announce Type: new Abstract: LLM-based generative agents are increasingly used in urban simulators, yet it remains unclear whether they reproduce empirically realistic human mobility patterns or merely generate plausible mobility narratives. We introduce a validation framework for evaluating the mobility of generative agents of LLM-based urban simulators against real-world mobility data. For this, we use mobility laws, temporal rhythms, network motifs, semantic activity transitions, and behavioral mobility profiles. Using datasets from the Greater Paris region and Shanghai, we evaluate AgentSociety and CitySim across multiple dimensions of mobility realism. Our analysis reveals a substantial gap between narrative plausibility and empirical mobility realism. Although the simulators capture some high-level semantic activity distributions, they struggle to reproduce core spatial and temporal constraints, including realistic trip-length distributions, origin-destination flows, dwell times, and transition dynamics. We further observe that realistic mobility diversity is unstable across default prompting configurations and may require explicit profile-aware initialization. To support reproducible evaluation, we also contribute scalable and open LLM-driven infrastructure for regional-scale map generation, observability-enhanced simulation, mobility-metric computation, and traffic simulation. Our findings highlight the need for rigorous empirical validation of LLM-based urban simulators and provide practical tools for building more realistic and reproducible urban simulation systems.

── more in #large-language-models 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/when-plausible-is-no…] indexed:0 read:1min 2026-06-15 ·