BayesBench

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-07-01

arxiv.org

large-language-models

BayesBench: Evaluating LLM Belief Trajectories Under Multi-Turn Evidence Accumulation

Researchers introduced BayesBench, a suite of simulation environments to evaluate how large language models update beliefs under multi-turn evidence accumulation. Testing seven LLMs from 3B to 70B par…

// co-occurs with top 1 entities

arXiv 1