Emotional Intelligence Benchmarks for LLMs
Github | Paper | | Twitter | About 💙EQ-Bench3
[🌀Spiral-Bench v1.2](spiral-bench.html)
[✍️Longform Writing](creative_writing_longform.html)
[🎨Creative Writing v3](creative_writing.html)
[☢️Slop Score](slop-score.html)
[⚖️Judgemark v4](judgemark-v4.html)
[🎤BuzzBench](buzzbench.html)
[🌍DiploBench](diplobench.html)
A benchmark measuring emotional intelligence in challenging roleplays. [Learn more](./about.html#eq-bench-3)
Note: Ability scores shown in the heatmap do not contribute to the Elo score. They are "higher is higher", not "higher is better".
| Model | Abilities | |
|---|
For more details about the benchmark, see the [About](./about.html#long) section.
The Elo score shown in the leaderboard is calculated from pair-wise model comparisons, where the LLM judge rates each response against eight core dimensions of emotional intelligence:
Note: the coloured “Abilities” heat-map columns (Humanlike, Safety, Assertive, etc.) are not used in the Elo calculation—they are purely informational, giving a quick view of each model’s stylistic traits and skill profile.
These are informational only -- not used for scoring.