LongJudgeBench

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:48

2026-06-03

arxiv.org

large-language-models

Benchmarking LLM-as-a-Judge for Long-Form Output Evaluation

Researchers have introduced LongJudgeBench, a new benchmark designed to evaluate the reliability of large language models (LLMs) when used as judges for long-form outputs. The benchmark reveals a subs…

// co-occurs with top 1 entities

LLM-as-a-judge 1

// topics top 5 topics

large language models 1 artificial intelligence 1 natural language processing 1 ai research 1 generative ai 1