04:00
2026-06-06
arxiv.org
large-language-models
Stability vs. Manipulability: Evaluating Robustness Under Post-Decision Interaction in LLM Judges
A new study reveals that LLM judges, widely used to evaluate AI outputs, can be manipulated after making an initial decision through targeted conversation, overturning stable judgments and shifting beโฆ