cd/entity/Item Response Theoryยท homeโ€บ entitiesโ€บ Item Response Theory
grep -l @item response theory /news/*.json | wc -l โ†’ 1

Item Response Theory

mentions 1 type Person feed RSS

// recent coverage 1 mentions

09:40
2026-07-01
arxiv.org
machine-learning

Why averaging LLM benchmark scores is fundamentally broken

A new study finds that averaging benchmark scores produces misleading rankings when evaluation data is sparse and item difficulty varies widely, with Spearman rank correlation dropping from 1.000 to 0โ€ฆ

// co-occurs with top 2 entities