{"slug": "rice-po-turning-retrieval-interactions-into-credit-signals-for-reasoning-agents", "title": "RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents", "summary": "Researchers have developed RICE-PO, a policy optimization framework that converts retrieval interactions into localized learning signals for training reasoning-based retrieval agents. The framework addresses the credit-assignment challenge in interactive retrieval by selecting high-uncertainty executable actions as anchors and evaluating counterfactual branches using retrieval metrics. In tests on BRIGHT and BEIR benchmarks, RICE-PO outperformed prompt-based agents and group-based reinforcement learning baselines under the same retriever settings.", "body_md": "arXiv:2605.26352v1 Announce Type: new\nAbstract: Retrieval is increasingly moving from one-shot matching toward interactive reasoning, where language agents iteratively inspect evidence, reformulate queries, and search again. Training such agents raises a credit-assignment challenge: executable actions such as queries or summaries can be directly evaluated by the retriever, while latent reasoning steps are not directly observable and only affect future executable actions. This asymmetry makes outcome-level reward assignment unreliable, as the same final reward may credit reasoning steps that did not actually shape retrieval success. We propose RICE-PO, a critic-free policy optimization framework that converts retrieval interactions into localized learning signals. RICE-PO selects high-uncertainty executable actions as anchors, evaluates local counterfactual branches using retrieval metrics, and propagates credit to latent reasoning steps only when reasoning-to-action influence is strong and future residual effects are stable. On BRIGHT and BEIR, RICE-PO consistently outperforms prompt-based agents and group-based RL baselines under the same retriever setting. These results show that the structure of agent-environment interaction itself can provide useful supervision for training reasoning-based retrieval agents.", "url": "https://wpnews.pro/news/rice-po-turning-retrieval-interactions-into-credit-signals-for-reasoning-agents", "canonical_source": "https://arxiv.org/abs/2605.26352", "published_at": "2026-05-27 04:00:00+00:00", "updated_at": "2026-05-27 04:34:11.381844+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "large-language-models", "ai-agents", "natural-language-processing"], "entities": ["RICE-PO", "BRIGHT", "BEIR"], "alternates": {"html": "https://wpnews.pro/news/rice-po-turning-retrieval-interactions-into-credit-signals-for-reasoning-agents", "markdown": "https://wpnews.pro/news/rice-po-turning-retrieval-interactions-into-credit-signals-for-reasoning-agents.md", "text": "https://wpnews.pro/news/rice-po-turning-retrieval-interactions-into-credit-signals-for-reasoning-agents.txt", "jsonld": "https://wpnews.pro/news/rice-po-turning-retrieval-interactions-into-credit-signals-for-reasoning-agents.jsonld"}}