@ministral-14b

mentions 1 type Organization feed RSS

04:00

2026-06-05

arxiv.org

large-language-models

ERRORQUAKE: Heavy-Tailed Error Severity Distributions in Open-Weight Large Language Models

A new study introduces Errorquake-10k, a benchmark that scores LLM responses on a continuous 0-4 severity scale, revealing that open-weight models with matched accuracy differ substantially in their e…

// co-occurs with top 2 entities

Errorquake-10k 1 deepseek-v3.2 1