Eqbench: Emotional Intelligence Benchmarks for LLMs Researchers have introduced EQ-Bench, a new benchmark designed to measure emotional intelligence in large language models (LLMs) through challenging roleplay scenarios. The benchmark calculates an Elo score based on pairwise model comparisons, with an LLM judge evaluating responses across eight core emotional intelligence dimensions. The tool aims to provide a standardized method for assessing how well AI systems understand and respond to human emotions. Emotional Intelligence Benchmarks for LLMs Github https://github.com/EQ-bench | Paper https://arxiv.org/abs/2312.06281 | | Twitter https://twitter.com/sam paech | About about.html 💙EQ-Bench3 🌀Spiral-Bench v1.2 spiral-bench.html ✍️Longform Writing creative writing longform.html 🎨Creative Writing v3 creative writing.html ☢️Slop Score slop-score.html ⚖️Judgemark v4 judgemark-v4.html 🎤BuzzBench buzzbench.html 🌍DiploBench diplobench.html A benchmark measuring emotional intelligence in challenging roleplays. Learn more ./about.html eq-bench-3 Note: Ability scores shown in the heatmap do not contribute to the Elo score. They are "higher is higher", not "higher is better". | Model | Abilities | | |---| For more details about the benchmark, see the About ./about.html long section. The Elo score shown in the leaderboard is calculated from pair-wise model comparisons, where the LLM judge rates each response against eight core dimensions of emotional intelligence: Note: the coloured “Abilities” heat-map columns Humanlike, Safety, Assertive, etc. are not used in the Elo calculation—they are purely informational, giving a quick view of each model’s stylistic traits and skill profile. These are informational only -- not used for scoring.