04:00
2026-06-05
arxiv.org
machine-learning
Policy-Conditioned Counterfactual Credit for Verifiable Reinforcement Learning of Long-Horizon Language Agents
Researchers have developed CVT-RL, a reinforcement learning algorithm that uses policy-conditioned counterfactual credit assignment to reduce unsupported evidence chains and shortcut actions in long-hโฆ