03:19
2026-06-28
arxiv.org
large-language-models
Improved LLM as a Judge Techniques
Researchers propose BINEVAL, a framework that decomposes LLM evaluation into atomic binary questions for interpretable, multi-dimensional scoring. The method matches or outperforms strong baselines onβ¦