LMSYS Chatbot Arena

mentions 1 type Person feed RSS

// recent coverage 1 mentions

08:05

2026-06-29

dev.to

large-language-models

LLM-as-a-Judge: I Built One From Scratch, Then Checked It Against Humans

A developer built an LLM-as-a-judge from scratch using Qwen2.5-1.5B-Instruct and tested it against the LMSYS Chatbot Arena dataset with human votes. The judge scored answers independently and agreed w…

// co-occurs with top 3 entities

Qwen2.5-1.5B-Instruct 1 Kaggle 1 agie-ai/lmsys-chatbot_arena_conversations 1