{"slug": "faithful-or-fabricated-a-causal-framework-for-rationalization-bias-in-llm-judges", "title": "Faithful or Fabricated? A Causal Framework for Rationalization Bias in LLM Judges", "summary": "Researchers at arXiv introduced a causal framework to detect rationalization bias in LLM judges, finding that these models often fabricate explanations to justify rankings influenced by non-evidential cues like verbosity and confidence. The team developed cue interventions and tie-aware metrics, including anchoring attacks, and tested them on 1,000 summaries, revealing substantial label-aligned rhetoric and explanation drift. Their proposed mitigation, PROOF-BEFORE-PREFERENCE, significantly improved cue invariance over standard chain-of-thought prompting, highlighting a critical flaw in using LLMs as automatic evaluators.", "body_md": "arXiv:2605.23970v1 Announce Type: new\nAbstract: Large language models (LLMs) are increasingly used as automatic judges for summarization and dialogue evaluation. Prior work has documented biases such as position, verbosity, and style preferences, but largely focuses on outcomes, leaving judge explanations underexplored. We instead ask whether LLM judges are cue-invariant, i.e., whether their rankings and explanations remain stable when non-evidential cues are perturbed while holding the underlying texts fixed. We introduce a suite of cue interventions (Blind, Truth, Flip, Placebo, Reveal-After) and tie-aware metrics that quantify outcome anchoring and rationale anchoring, including label-aligned rhetoric and explanation drift, alongside consistency and stereotype-intrusion checks. We design anchoring attacks using verbosity and confidence cues, and compare two mitigations: structured chain-of-thought prompting and PROOF-BEFORE-PREFERENCE (evidence lock, score, rank). Using a new dataset of 1,000 summaries from traditional extractive models and LLMs, we find substantial cue-anchored rationalization under label and placebo perturbations, while PROOF-BEFORE-PREFERENCE markedly improves cue invariance over baselines.", "url": "https://wpnews.pro/news/faithful-or-fabricated-a-causal-framework-for-rationalization-bias-in-llm-judges", "canonical_source": "https://arxiv.org/abs/2605.23970", "published_at": "2026-05-26 04:00:00+00:00", "updated_at": "2026-05-26 04:14:54.789158+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-safety", "ai-ethics", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/faithful-or-fabricated-a-causal-framework-for-rationalization-bias-in-llm-judges", "markdown": "https://wpnews.pro/news/faithful-or-fabricated-a-causal-framework-for-rationalization-bias-in-llm-judges.md", "text": "https://wpnews.pro/news/faithful-or-fabricated-a-causal-framework-for-rationalization-bias-in-llm-judges.txt", "jsonld": "https://wpnews.pro/news/faithful-or-fabricated-a-causal-framework-for-rationalization-bias-in-llm-judges.jsonld"}}