01:53
2026-06-17
arxiv.org
large-language-models
The Benchmark Illusion: Pruned LLMs Can Pass Multiple Choice but Fail to Answer
Researchers found that pruned large language models can pass multiple-choice benchmarks but fail to answer the same questions in open generation, creating a 'benchmark illusion.' The study, using mult…