@SDPG

mentions 1 type Organization feed RSS

04:00

2026-06-04

arxiv.org

machine-learning

Self-Distilled Policy Gradient

Researchers introduced SDPG, a self-distilled policy-gradient framework that combines group-relative verifier advantages with normalized standard deviation and full-vocabulary on-policy self-distillat…

// co-occurs with top 1 entities

RLVR 1