04:00
2026-06-16
arxiv.org
large-language-models
CoRA: Confidence-Rationale Alignment for Reliable Chain-of-Thought Reasoning
Researchers introduced CoRA, a GRPO-based reinforcement learning framework that aligns LLM confidence with rationale quality in chain-of-thought reasoning. Across MedQA, MathQA, and OpenBookQA, CoRA rโฆ