04:00
2026-06-05
arxiv.org
large-language-models
The Evaluation Blind Spot: A Stereological Theory of Benchmark Coverage for Large Language Models
A new stereological theory reveals that current LLM benchmarks suffer from a structural blind spot exceeding the runner-up score gap by two orders of magnitude and dominating statistical noise by 52-1โฆ