{"slug": "nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-codex", "title": "NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code", "summary": "NVIDIA researchers released Polar, a rollout framework that trains language agents through reinforcement learning without altering their agent harnesses. Using GRPO on a Qwen3.5-4B base model, Polar improved SWE-Bench Verified pass@1 by 22.6 points under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository.", "body_md": "NVIDIA researchers have introduced Polar, a rollout framework that trains language agents using reinforcement learning without modifying their agent harnesses. Polar places a model API proxy between the harness and the inference server, capturing token-level interactions and reconstructing trainer-ready trajectories. Using GRPO on a Qwen3.5-4B base model, Polar improves SWE-Bench Verified pass@1 by 22.6 points under the Codex harness, 4.8 points under Claude Code, and 6.2 points under Pi. The framework is registered as a NeMo Gym environment and released under the ProRL Agent Server repository.\n\nThe post [NVIDIA Releases Polar, a Token-Faithful Rollout Framework for GRPO Training Across Codex, Claude Code, and Qwen Code](https://www.marktechpost.com/2026/05/27/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-across-codex-claude-code-and-qwen-code/) appeared first on [MarkTechPost](https://www.marktechpost.com).", "url": "https://wpnews.pro/news/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-codex", "canonical_source": "https://www.marktechpost.com/2026/05/27/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-across-codex-claude-code-and-qwen-code/", "published_at": "2026-05-27 17:09:38+00:00", "updated_at": "2026-05-29 08:13:27.008141+00:00", "lang": "en", "topics": ["large-language-models", "ai-research", "ai-agents", "ai-infrastructure", "machine-learning"], "entities": ["NVIDIA", "Polar", "GRPO", "Codex", "Claude Code", "Qwen", "NeMo Gym", "ProRL Agent Server"], "alternates": {"html": "https://wpnews.pro/news/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-codex", "markdown": "https://wpnews.pro/news/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-codex.md", "text": "https://wpnews.pro/news/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-codex.txt", "jsonld": "https://wpnews.pro/news/nvidia-releases-polar-a-token-faithful-rollout-framework-for-grpo-training-codex.jsonld"}}