IVR-R1: Refining Trajectories through Iterative Visual-Grounded Reasoning in Reinforcement Learning
Researchers have introduced IVR-R1, a reinforcement learning framework that iteratively realigns visual reasoning trajectories to correct errors in multimodal large language models. The method uses a reward-driven screen…