How far do open weights trail the frontier?

A new analysis using Epoch's ECI metric shows that open-weight AI models continue to trail closed models on the frontier, with the gap persisting over time. The analysis, based on item response theory, provides a more nuanced view than the AA Index, revealing that open-weight models have not yet matched the performance of leading closed models like those from OpenAI and Anthropic.

I saw this Twitter post https://x.com/yaroslavvb/status/2067367657272422584 today and really liked the idea. But I think the AA Index https://artificialanalysis.ai/articles/artificial-analysis-intelligence-index-v4-1 is a rather crude way and much prefer ECI from Epoch https://epoch.ai/eci , which uses IRT https://en.wikipedia.org/wiki/Item response theory . The resulting graph does meaningfully diverge from the Twitter post which seems to weirdly collapse at the end, maybe because of no logistic assumptions being taken into consideration : see linkpost to actually interact with graphs, like seeing what model is what, etc For context, the two raw frontiers - the running best ECI over time for open-weight vs closed models: Sadly, GLM-5.2 has not been scored yet, but I'll update the website when it is. You can also generalize to other criteria though this is probably the most interesting one . One such example would be the OpenAI vs Anthropic rivalry: