OPRD: On-Policy Representation Distillation

wpnews.pro

cd /news/machine-learning/oprd-on-policy-representation-distil… · home › topics › machine-learning › article

[ARTICLE · art-22128] src=arxiv.org ↗ pub=2026-06-05T02:44Z topic=machine-learning verified=true sentiment=↑ positive

OPRD: On-Policy Representation Distillation

Researchers have introduced On-Policy Representation Distillation (OPRD), a method that aligns student and teacher model representations across selected layers during training, bypassing the language model head to eliminate sampling variance. The approach closes the student-teacher performance gap on AIME 2024/2025 and AIMO benchmarks, where output-space distillation methods plateau below the teacher. OPRD also achieves 1.44x faster training and 54% less memory usage compared to top-k on-policy distillation.

read2 min views17 publishedJun 5, 2026

[Submitted on 4 Jun 2026]


[View PDF](/pdf/2606.06021)

[HTML (experimental)](https://arxiv.org/html/2606.06021v1)

Abstract:On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen's ~150k tokens) persists throughout training, and (2) it treats the teacher as a black-box, discarding all intermediate hidden states after the LM head. We propose On-Policy Representation Distillation (OPRD), which lifts distillation into hidden-state space by aligning student and teacher representations across selected layers on the same rollouts, bypassing the LM head entirely. Theoretically, OPRD eliminates sampling variance and provides richer per-layer structural information. Empirically, OPRD closes the student-teacher gap on AIME 2024/2025 and AIMO, while output-space OPD baselines plateau below the teacher. OPRD also trains 1.44x faster and uses 54% less memory than top-k OPD. Code:[this https URL].

References & Citations

...

Bibliographic Explorer

(What is the Explorer?) Connected Papers

(What is Connected Papers?) Litmaps

(What is Litmaps?) scite Smart Citations

(What are Smart Citations?)# Code, Data and Media Associated with this Article alphaXiv

(What is alphaXiv?) CatalyzeX Code Finder for Papers

(What is CatalyzeX?) DagsHub

(What is DagsHub?) Gotit.pub

(What is GotitPub?) Hugging Face

(What is Huggingface?) ScienceCast

(What is ScienceCast?)# Demos Influence Flower

(What are Influence Flowers?) CORE Recommender

(What is CORE?) IArxiv Recommender

(What is IArxiv?)# arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/oprd-on-policy-represent…

Read original on arxiv.org → arxiv.org/abs/2606.06021

mentioned entities

OPRD

OPD

Qwen

AIME

AIMO

metadata

slugoprd-on-policy-representation-distillation

topic#machine-learning

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevThe $200 Billion Reason Amazon C…

next →Schneider Electric plans €800M d…

── more in #machine-learning 4 stories · sorted by recency

arxiv.org · 22 Jul · #machine-learning

Convolution for Large Language Models

machinebrief.com · 22 Jul · #machine-learning

DAIS: Dependency-Aware Intermediate QA Supervision for Complex Reasoning

machinebrief.com · 22 Jul · #machine-learning

Reasoning Before Translation: Enhancing Legal Machine Translation with Structured Reasoning

arxiv.org · 22 Jul · #machine-learning

MILP-Evo: Closed-Loop Fully Automatic Design of MILP Solvers

── more on @oprd 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required