08:05
2026-06-29
dev.to
large-language-models
LLM-as-a-Judge: I Built One From Scratch, Then Checked It Against Humans
A developer built an LLM-as-a-judge from scratch using Qwen2.5-1.5B-Instruct and tested it against the LMSYS Chatbot Arena dataset with human votes. The judge scored answers independently and agreed wโฆ