04:00
2026-06-18
arxiv.org
large-language-models
Self-CTRL: Self-Consistency Training with Reinforcement Learning
Researchers introduced Self-Consistency Training with Reinforcement Learning (Self-CTRL), a method that aligns language models' self-explanations with their actual behavior. In tests, the approach impโฆ