04:00
2026-05-29
arxiv.org
large-language-models
When Models Disagree: Rethinking LLM Evaluation for Public Comment Analysis
Federal agencies using large language models to categorize public comments face a hidden risk: different models can produce fundamentally different categorizations of the same input, yet standard accu…