Rethinking Groups in Critic-Free RLVR

wpnews.pro

cd /news/machine-learning/rethinking-groups-in-critic-free-rlv… · home › topics › machine-learning › article

[ARTICLE · art-30554] src=arxiv.org ↗ pub=2026-06-17T04:00Z topic=machine-learning verified=true sentiment=· neutral

Rethinking Groups in Critic-Free RLVR

Researchers propose negative token filtering, a method enabling stable single-rollout training for critic-free reinforcement learning in large language models, outperforming group-based techniques on agentic tasks.

read1 min views1 publishedJun 17, 2026

arXiv:2606.17250v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a central paradigm for post-training large language models. Existing critic-free RL methods typically generate a group of rollouts for the same question to estimate value baselines for advantage computation. However, this design suffers from data inefficiency, group synchronization barriers, and inflexibility with structured rollouts. In this work, we revisit the role of the ``group'' and show that its underlying function is not merely to estimate baselines but to prevent false penalties on negative samples. Building on this insight, we propose negative token filtering, a simple and effective strategy that enables stable single-rollout training. We apply it to two batch-level advantage methods, achieving comparable performance on reasoning tasks and stronger performance on agentic tasks relative to group-based RL techniques.

source & further reading

arxiv.org — original article

── more in #machine-learning 4 stories · sorted by recency

github.com · 17 Jun · #machine-learning

GPT-2 124M checkpoint pre-trained on OpenWebText 27.5B tokens

letsdatascience.com · 17 Jun · #machine-learning

Paper Analyzes Chain-of-Thought State Tracking in Transformer Model

code.visualstudio.com · 17 Jun · #machine-learning

Visual Studio Code 1.125

wired.com · 17 Jun · #machine-learning

Anthropic Is Still at Odds with the White House over Claude Fable 5

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required