CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

wpnews.pro

cd /news/large-language-models/cora-confidence-rationale-alignment-… · home › topics › large-language-models › article

[ARTICLE · art-28953] src=arxiv.org ↗ pub=2026-06-16T04:00Z topic=large-language-models verified=true sentiment=↑ positive

CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning

Researchers introduced CoRA, a GRPO-based reinforcement learning framework that aligns LLM confidence with rationale quality in chain-of-thought reasoning. Across MedQA, MathQA, and OpenBookQA, CoRA reduced confidence-rationale alignment error by up to 26.51% while maintaining accuracy, demonstrating that reliable reasoning requires both confident answers and substantiating rationales.

read1 min views24 publishedJun 16, 2026

arXiv:2606.14961v1 Announce Type: new Abstract: Chain-of-thought (CoT) reasoning can improve LLM performance, but high answer confidence may be misleading when the accompanying CoT rationale is plausible yet incomplete or poorly supported. We study confidence--rationale alignment: whether a model's confidence in its committed answer is justified by its generated rationale. We introduce a GRPO-based reinforcement learning framework that jointly rewards answer correctness, committed-answer probability, and rubric-based rationale support, where the rubric assesses grounding, coherence, task match, and connection to the selected answer without revealing the gold answer to the judge. Across MedQA, MathQA, and OpenBookQA using three open-weight LLMs, our method reduces the confidence--rationale alignment error by up to 26.51% compared with untuned checkpoints, SFT, and correctness-only GRPO, while maintaining competitive accuracy and often improving calibration. These results show that reliable CoT reasoning requires not only confident answers, but rationales that substantively support them.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/cora-confidence-rational…

Read original on arxiv.org → arxiv.org/abs/2606.14961

mentioned entities

CoRA

GRPO

MedQA

MathQA

OpenBookQA

metadata

slugcora-confidence-rationale-alignment-for-reliable-chain-of-thought-reasoning

topic#large-language-models

secondary3 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevShould you buy a Mac mini now or…

next →Could a diamond wafer as wide as…

── more in #large-language-models 4 stories · sorted by recency

lesswrong.com · 2 Aug · #large-language-models

Single Forward Pass Evals on Fable, Opus 5, and GPT-5.6-Sol

dev.to · 2 Aug · #large-language-models

Your AI Agent's Chat History Is User Input

lesswrong.com · 2 Aug · #large-language-models

Industrializing a small field: Lessons from Vannevar

startupfortune.com · 2 Aug · #large-language-models

BitGo CEO Mike Belshe Dares Anthropic's Claude to Steal His Bitcoin

── more on @cora 3 stories trending now

wpnews · 2 Aug · #artificial-intelligence

I Ran 8 AI APIs Through the Same 50 Prompts — Here's the Real Cost Breakdown

wpnews · 2 Aug · #developer-tools

Agent-Browser – Browser Automation for AI

wpnews · 2 Aug · #artificial-intelligence

Payment Rail vs. Settlement Layer: What AEON's Coinbase x402 Partnership Actually Validates

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required