Popoviciu

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-06-18

arxiv.org

machine-learning

RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents

Researchers propose RODS (Reward-driven Online Data Synthesis) to address the depletion of informative samples in multi-turn tool-use reinforcement learning. RODS uses progress reward variance as a bo…

// co-occurs with top 2 entities

RODS 1 GRPO 1