An arXiv paper (arXiv:2605.26437) by Po Han Teo, submitted 26 May 2026, proposes a bounded-rationality framing to explain systematic differences between humans and large language models (LLMs) in strategic settings, according to the paper's abstract. The author argues that humans' strategic choices can be modelled as a classical baseline plus an additive correction arising from bounded computation, and that LLMs often bypass that bound by retrieving and recombining corpus material, per the abstract. The paper reports that neither fine-tuning on human response data nor persona conditioning fully closes the gap, and it proposes four operational tests-conditional dependence, distributional asymmetry, path-dependence under repetition, and paraphrase-robustness-to discriminate human-shaped from LLM-shaped responses, per the arXiv abstract. Editorial analysis: For researchers using LLM agents as human proxies in behavioural or political-science experiments, the paper highlights design validity risks and gives concrete tests to quantify divergence.
What happened
An arXiv paper titled Divergent Minds, Convergent Baselines (arXiv:2605.26437) by Po Han Teo was submitted on 26 May 2026, per the arXiv entry. The paper frames human strategic behaviour in the bounded-rationality tradition and presents a mathematical account that treats the behavioural correction term as the signature of bounded computation, according to the paper's abstract. The abstract states that in canonical games present in standard training corpora, LLMs retrieve and recombine corpus material in ways that bypass the computational bounds that produce human corrections, and that neither fine-tuning on human response data nor persona conditioning has closed the empirical gap, per the arXiv abstract.
Technical details (per the paper)
The proposed framework reads the gap between an unboundedly rational solution and what a computationally bounded agent produces as a formal "bounded-computation" term, as described in the abstract. The paper extends the framing to reasoning-distilled models using cognitive-hierarchy theory, arguing that accessible level-strategic reasoning for models is bounded by compute budget and context length rather than by human cognitive constraints, per the abstract. The author proposes four operational tests-conditional dependence, distributional asymmetry, path-dependence under repetition, and paraphrase-robustness-to empirically distinguish human-shaped from LLM-shaped strategic responses, per the arXiv abstract.
Editorial analysis - technical context
Researchers who deploy LLM agents as stand-ins for human subjects in strategic experiments face a structural validity issue that this paper articulates: the computational origins of human boundedness differ from the retrieval-driven behaviour often seen in LLMs. Industry-pattern observations suggest that testing for distributional asymmetries and paraphrase sensitivity is a practical way to detect whether an agent's behaviour stems from corpus retrieval versus bounded deliberation.
Context and significance
Editorial analysis: The paper is relevant to cross-disciplinary audiences - behavioural economists, political scientists, and ML researchers - who are increasingly using LLMs as experimental subjects. By proposing concrete tests and linking effect size to peer-signal individuation, the paper supplies operational tools for evaluating when LLM-to-human substitution is plausible and when it is not. The abstract also includes a moderator prediction that links effect magnitude to peer-signal individuation and reports a quantitative bound referenced as "Cohen's" between named-opponent and aggregate-opponent settings, per the arXiv abstract.
What to watch
For practitioners: look for follow-up empirical studies that implement the four proposed tests, reported effect sizes comparing named-opponent versus aggregate-opponent conditions, and experiments that measure sensitivity to paraphrasing and repetition. Reports that quantify how fine-tuning or reasoning-distillation alters the proposed test statistics will be especially informative for evaluating LLMs as experimental proxies.
Scoring Rationale #
The paper addresses a notable methodological risk for researchers using LLM agents as substitutes for human subjects in strategic experiments and supplies concrete tests. It is significant for experimental design and validation but is an academic contribution rather than a field-defining technical breakthrough.
Practice interview problems based on real data
1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.