cd /news/machine-learning/rethinking-groups-in-critic-free-rlv… · home topics machine-learning article
[ARTICLE · art-30554] src=arxiv.org ↗ pub= topic=machine-learning verified=true sentiment=· neutral

Rethinking Groups in Critic-Free RLVR

Researchers propose negative token filtering, a method enabling stable single-rollout training for critic-free reinforcement learning in large language models, outperforming group-based techniques on agentic tasks.

read1 min views1 publishedJun 17, 2026

arXiv:2606.17250v1 Announce Type: new Abstract: Reinforcement learning (RL) has become a central paradigm for post-training large language models. Existing critic-free RL methods typically generate a group of rollouts for the same question to estimate value baselines for advantage computation. However, this design suffers from data inefficiency, group synchronization barriers, and inflexibility with structured rollouts. In this work, we revisit the role of the ``group'' and show that its underlying function is not merely to estimate baselines but to prevent false penalties on negative samples. Building on this insight, we propose negative token filtering, a simple and effective strategy that enables stable single-rollout training. We apply it to two batch-level advantage methods, achieving comparable performance on reasoning tasks and stronger performance on agentic tasks relative to group-based RL techniques.

── more in #machine-learning 4 stories · sorted by recency
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/rethinking-groups-in…] indexed:0 read:1min 2026-06-17 ·