Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments

wpnews.pro

cd /news/large-language-models/measuring-judgment-quality-in-natura… · home › topics › large-language-models › article

[ARTICLE · art-45918] src=arxiv.org ↗ pub=2026-07-01T04:00Z topic=large-language-models verified=true sentiment=↑ positive

Measuring Judgment Quality in Natural-Language Explanations: Evidence from Forecasting Tournaments

Researchers introduced Explanation Quality Markers (EQMs), a set of sixty reasoning patterns scored by large language models, to measure judgment quality in natural-language explanations. In a pre-registered analysis of over 55,000 forecast-rationale pairs from a multiyear forecasting tournament, EQMs predicted accuracy at both forecast and forecaster levels, outperforming pre-LLM methods and identifying underperformers more reliably than top performers. The method provides a scalable, interpretable tool for extracting judgment-relevant information from written explanations.

read1 min views1 publishedJul 1, 2026

arXiv:2606.30987v1 Announce Type: new Abstract: Decision-makers routinely rely on expert judgments accompanied by written explanations, yet explanation quality is difficult to measure at scale. Forecasting tournaments offer a natural testing ground: probabilistic judgments are paired with natural-language rationales and scored against realized outcomes. We introduce Explanation Quality Markers (EQMs), a set of sixty theory-guided reasoning patterns scored by large language models (LLMs). In a pre-registered analysis of over 55,000 forecast-rationale pairs from a multiyear forecasting tournament, EQMs predict accuracy at both the forecast and forecaster levels, consistently outperforming pre-LLM text-analysis methods. More than 90% of statistically significant pattern-level EQM-accuracy correlations match our directional hypotheses. The signal is asymmetric: EQMs identify likely underperformers more reliably than they distinguish the very best forecasters. Benchmarked against traditional indicators of forecasting skill, EQMs are the strongest predictor at the forecast level and competitive at the forecaster level, though weaker than prior accuracy. Human ratings of rationale quality are less consistently correlated with accuracy and place disproportionate weight on rationale length. Results transfer to an independent forecasting study. EQMs provide a scalable, interpretable method for extracting judgment-relevant information from written explanations.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/measuring-judgment-quali…

Read original on arxiv.org → arxiv.org/abs/2606.30987

mentioned entities

Explanation Quality Markers

EQMs

arXiv

LLMs

metadata

slugmeasuring-judgment-quality-in-natural-language-explanations-evidence-from

topic#large-language-models

secondary2 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevI Built 5 Free AI Tools That Rep…

next →Sivers emission övertecknades "f…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 1 Jul · #large-language-models

Training Therapeutic Judges and Multi-Agent Systems for Human-Aligned Mental Health Support

arxiv.org · 1 Jul · #large-language-models

A Single Rewrite Suffices: Empirical Lessons from Production Skill Description Optimization

arxiv.org · 1 Jul · #large-language-models

When transformers learn "impossible" languages, what do they learn?

arxiv.org · 1 Jul · #large-language-models

Using AI Agents to Automate Black-Box Audits of Personalization Algorithms at Scale

── more on @explanation quality markers 3 stories trending now

wpnews · 30 May · #ai-tools

I was wasting 10 minutes every Claude session. So I built a fix.

wpnews · 27 May · #machine-learning

hunting for headroom on modded-nanoGPT (WR #82)

wpnews · 2 Jun · #ai-products

Microsoft launches Discovery platform for scientific R&D with Ginkgo Bioworks partnership

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required