04:00
2026-06-05
arxiv.org
large-language-models
ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models
A new study introduces Errorquake-10k, a benchmark that scores LLM responses on a continuous 0-4 severity scale, revealing that open-weight models with matched accuracy differ substantially in their eโฆ