16:42
2026-06-26
arxiv.org
large-language-models
Combining LLMs Rarely Beats the Best Single Model, I tested 67 frontier models
A new study testing 67 frontier language models from 21 providers found that combining multiple models rarely outperforms the single best model, with gains capped by a 'co-failure ceiling' where all mโฆ