IFBench

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

03:19

2026-06-28

arxiv.org

large-language-models

Improved LLM as a Judge Techniques

Researchers propose BINEVAL, a framework that decomposes LLM evaluation into atomic binary questions for interpretable, multi-dimensional scoring. The method matches or outperforms strong baselines on…

// co-occurs with top 6 entities

BINEVAL 1 UniEval 1 G-Eval 1 SummEval 1 Topical-Chat 1 QAGS 1