How far do open weights trail the frontier?

wpnews.pro

cd /news/large-language-models/how-far-do-open-weights-trail-the-fr… · home › topics › large-language-models › article

[ARTICLE · art-32469] src=lesswrong.com ↗ pub=2026-06-18T11:01Z topic=large-language-models verified=true sentiment=· neutral

How far do open weights trail the frontier?

A new analysis using Epoch's ECI metric shows that open-weight AI models continue to trail closed models on the frontier, with the gap persisting over time. The analysis, based on item response theory, provides a more nuanced view than the AA Index, revealing that open-weight models have not yet matched the performance of leading closed models like those from OpenAI and Anthropic.

read1 min views22 publishedJun 18, 2026

I saw this Twitter post today and really liked the idea. But I think the AA Index is a rather crude way and much prefer ECI from Epoch, which uses IRT. The resulting graph does meaningfully diverge from the Twitter post (which seems to weirdly collapse at the end, maybe because of no logistic assumptions being taken into consideration):

[see linkpost to actually interact with graphs, like seeing what model is what, etc]

For context, the two raw frontiers - the running best ECI over time for open-weight vs closed models: Sadly, GLM-5.2 has not been scored yet, but I'll update the website when it is.

You can also generalize to other criteria (though this is probably the most interesting one). One such example would be the OpenAI vs Anthropic rivalry:

source & further reading

lesswrong.com — original article Dispatch from Anthropic v. Department of War Summary Judgment Motion Hearing Friend-Shaped Objects: A Case Study on My Use of Large Language Models The Art of Shipping Slopware

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-far-do-open-weights-…

Read original on lesswrong.com → www.lesswrong.com/posts/DbLnB7eo9wBDDnSEq/how-fa…

mentioned entities

Epoch

Artificial Analysis