04:00
2026-06-18
arxiv.org
machine-learning
RODS: Reward-Driven Online Data Synthesis for Multi-Turn Tool-Use Agents
Researchers propose RODS (Reward-driven Online Data Synthesis) to address the depletion of informative samples in multi-turn tool-use reinforcement learning. RODS uses progress reward variance as a boโฆ