VeRL

mentions 2 type Organization feed RSS

// recent coverage 2 mentions

04:00

2026-05-26

arxiv.org

machine-learning

Accelerating Long-Tail Generation in Synchronous RLHF Training via Adaptive Tensor Parallelism

Researchers have developed PAT, an adaptive tensor parallelism method that dynamically reconfigures GPU resource allocation during the generation stage of synchronous RLHF training to address the bott…

00:00

2026-04-20

andlukyane.com

large-language-models

FIPO: Teaching LLMs Which Thoughts Actually Matter

FIPO (Future-Impact-based Policy Optimization) is a reinforcement learning method that improves LLM reasoning by assigning token-level credit based on each token's future impact on the policy, rather …

// co-occurs with top 8 entities

FIPO 1 Qwen2.5-32B-Base 1 DAPO 1 AIME 2024 1 PAT 1 SGLang 1 LLaMA3.1-8B 1 Qwen3-14B 1