04:00
2026-05-27
arxiv.org
large-language-models
Conv-to-Bench: Evaluating Language Models Via User-Assistant Dialogues In Code Tasks
Researchers have developed Conv-to-Bench, a framework that automatically converts real-world user-assistant dialogues into structured evaluation benchmarks for large language models. In programming taβ¦