RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents

wpnews.pro

cd /news/artificial-intelligence/rice-po-turning-retrieval-interactio… · home › topics › artificial-intelligence › article

[ARTICLE · art-14919] src=arxiv.org ↗ pub=2026-05-27T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

RICE-PO: Turning Retrieval Interactions into Credit Signals for Reasoning Agents

Researchers have developed RICE-PO, a policy optimization framework that converts retrieval interactions into localized learning signals for training reasoning-based retrieval agents. The framework addresses the credit-assignment challenge in interactive retrieval by selecting high-uncertainty executable actions as anchors and evaluating counterfactual branches using retrieval metrics. In tests on BRIGHT and BEIR benchmarks, RICE-PO outperformed prompt-based agents and group-based reinforcement learning baselines under the same retriever settings.

read1 min views11 publishedMay 27, 2026

arXiv:2605.26352v1 Announce Type: new Abstract: Retrieval is increasingly moving from one-shot matching toward interactive reasoning, where language agents iteratively inspect evidence, reformulate queries, and search again. Training such agents raises a credit-assignment challenge: executable actions such as queries or summaries can be directly evaluated by the retriever, while latent reasoning steps are not directly observable and only affect future executable actions. This asymmetry makes outcome-level reward assignment unreliable, as the same final reward may credit reasoning steps that did not actually shape retrieval success. We propose RICE-PO, a critic-free policy optimization framework that converts retrieval interactions into localized learning signals. RICE-PO selects high-uncertainty executable actions as anchors, evaluates local counterfactual branches using retrieval metrics, and propagates credit to latent reasoning steps only when reasoning-to-action influence is strong and future residual effects are stable. On BRIGHT and BEIR, RICE-PO consistently outperforms prompt-based agents and group-based RL baselines under the same retriever setting. These results show that the structure of agent-environment interaction itself can provide useful supervision for training reasoning-based retrieval agents.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/rice-po-turning-retrieva…

Read original on arxiv.org → arxiv.org/abs/2605.26352

mentioned entities

RICE-PO

BRIGHT

BEIR

metadata

slugrice-po-turning-retrieval-interactions-into-credit-signals-for-reasoning-agents

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevSejong University launches Asia’…

next →European AI adoption hits 99% wi…

── more in #artificial-intelligence 4 stories · sorted by recency

machinebrief.com · 15 Jul · #artificial-intelligence

RESOURCE2SKILL: Transforming Videos into Agent Skills

machinebrief.com · 15 Jul · #artificial-intelligence

LLM Confidence: Meet the Self-Evolving Critic

machinebrief.com · 15 Jul · #artificial-intelligence

Agentic AI: The Next Frontier in Service Computing

machinebrief.com · 15 Jul · #artificial-intelligence

ASRD: Revolutionizing Decoding with Anchor Tokens

── more on @rice-po 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 23 May · #artificial-intelligence

AccessLens — a blind person's lanyard, powered by Gemma 4 on-device

wpnews · 21 May · #developer-tools

Antigravity CLI: A Hands-On Guide to Google's Terminal Coding Agent

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required