Video Optimal Transport Enables Feedback-Efficient Reward Learning

wpnews.pro

cd /news/machine-learning/video-optimal-transport-enables-feed… · home › topics › machine-learning › article

[ARTICLE · art-29034] src=letsdatascience.com ↗ pub=2026-06-16T05:20Z topic=machine-learning verified=true sentiment=↑ positive

Video Optimal Transport Enables Feedback-Efficient Reward Learning

Researchers from KAIST introduced Video-based Optimal Transport Preference (VOTP), a method that uses optimal transport over Video Foundation Model embeddings to generate high-fidelity pseudo-labels from a small number of human preferences. The approach reduces required human supervision and outperforms state-of-the-art offline preference-based reinforcement learning methods on locomotion and manipulation benchmarks. The paper was accepted for oral presentation at ICML 2026.

read3 min views25 publishedJun 16, 2026

Per the arXiv abstract (arXiv:2606.16856, submitted 15 Jun 2026), the paper "Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning" by Minh-Tung Luu, Hwanhee Kim, Younghwan Lee, and Chang D. Yoo introduces Video-based Optimal Transport Preference (VOTP). Per the abstract, VOTP leverages optimal transport over Video Foundation Model (ViFM) embeddings to generate high-fidelity pseudo-labels from a small number of human preferences, reducing required human supervision and outperforming state-of-the-art offline preference-based RL methods on locomotion and manipulation benchmarks. The ICML 2026 program page lists an oral presentation on July 8, 2026, indicating the paper was accepted for oral presentation at ICML. Editorial analysis: This work situates recent ViFM representation gains inside semi-supervised preference learning, offering a practical path to lower labeling budgets for offline PbRL tasks.

What happened

Per the arXiv abstract (arXiv:2606.16856, submitted 15 Jun 2026), the paper titled "Video-Based Optimal Transport for Feedback-Efficient Offline Preference-Based Reinforcement Learning" introduces a method named Video-based Optimal Transport Preference (VOTP). The ICML 2026 program page lists the paper as an oral presentation scheduled for July 8, 2026, confirming acceptance to the conference. Per the abstract, the authors report that VOTP uses optimal transport in the representation space of Video Foundation Models to produce high-fidelity pseudo-labels from a handful of human preference labels and that the method outperforms state-of-the-art offline preference-based RL methods on locomotion and manipulation benchmarks.

Technical details

Per the arXiv abstract and accompanying OpenReview/ICML materials, VOTP aligns visual trajectories by computing an optimal transport plan over embeddings produced by a Video Foundation Model, then uses that alignment to assign pseudo-preferences to unlabeled trajectory pairs. The paper frames the offline PbRL problem as having two inputs, an offline dataset collected from an unknown policy and a small preference dataset, and reports that VOTP uses semi-supervised pseudo-labeling to scale preference learning with minimal human queries. The authors also report experiments showing robustness to visual distractors and real-robot validations, per the abstract and ICML page.

Industry context

Editorial analysis: Preference-based RL aims to replace manual reward engineering with human judgements, but practitioners routinely face steep labeling costs and distributional mismatch between offline logs and evaluation scenarios. Industry-pattern observations: Recent advances in Video Foundation Models provide rich trajectory-level embeddings that researchers increasingly use as a substrate for downstream supervision via similarity metrics, clustering, or retrieval. Companies and labs exploring offline PbRL are likely to watch methods that convert ViFM similarity into pseudo-supervision because they can reduce annotation budgets while leveraging large offline datasets.

Context and significance

Editorial analysis: If VOTP's reported gains hold across broader benchmarks, the approach could materially lower the practical cost of preference collection in offline settings, particularly for robotic manipulation and locomotion where physical trials and human labeling are expensive. Editorial analysis: The use of optimal transport to align entire trajectories, rather than frame-by-frame heuristics, is technically notable because it preserves temporal structure and can produce more semantically consistent pseudo-labels when ViFM embeddings capture dynamics and intent.

What to watch

Editorial analysis: Observers should look for the paper's public code release, detailed ablations on ViFM choice, and sensitivity to embedding quality, since the method's reliability depends on the representational fidelity of the underlying Video Foundation Model. Editorial analysis: Practitioners should also monitor evaluation scope beyond standard benchmarks, including how well pseudo-labels generalize when offline datasets contain diverse policies or when human preference signals are sparse and noisy.

Scoring Rationale #

The paper presents a practical technique that combines Video Foundation Model embeddings and optimal transport to reduce preference-labeling costs, a notable advance for offline PbRL and robotics. The ICML oral acceptance raises visibility, but broader adoption depends on code release and replication across datasets, so the story rates as a notable research contribution rather than a paradigm shift.

Practice interview problems based on real data

1,500+ SQL & Python problems across 15 industry datasets — the exact type of data you work with.

Try 250 free problems

source & further reading

letsdatascience.com — original article Indian Banks Raise Cybersecurity Spending as AI Threats Mature Senators Introduce AI DATA Act to Track Workforce Change OpenAI Maps Frontier Safety Controls to California and EU Rules

~/api · this article 200

$curl api.wpnews.pro/v1/news/video-optimal-transport-…

Read original on letsdatascience.com → letsdatascience.com/news/video-optimal-transport…

mentioned entities

KAIST

Minh-Tung Luu

Hwanhee Kim

Younghwan Lee

Chang D. Yoo

ICML

Video Foundation Model

metadata

slugvideo-optimal-transport-enables-feedback-efficient-reward-learning

topic#machine-learning

secondary2 topics

sentimentpositive

canonicalletsdatascience.com

navigation

← prevLatent-space RL estimates materi…

next →RDS presents hybrid fusion for i…

── more in #machine-learning 4 stories · sorted by recency

the-decoder.com · 31 Jul · #machine-learning

Google Deepmind unveils Gemini Robotics 2 to power robots of all shapes from tabletop arms to humanoids

localai.io · 31 Jul · #machine-learning

Why we write our own C and C++ inference engines

letsdatascience.com · 27 Jul · #machine-learning

Microsoft Funds 18 University Labs Through EXTRA AI Red Team Alliance

koreatimes.co.kr · 27 Jul · #machine-learning

Sungkyunkwan University researchers develop system to prevent subway door accidents through collaboration

── more on @kaist 3 stories trending now

wpnews · 30 Jul · #artificial-intelligence

Microsoft and Meta Earnings Show Different AI Spending Pressures

wpnews · 31 Jul · #artificial-intelligence

Rewriting a Six-Year-Old Personal Project with AI

wpnews · 31 Jul · #artificial-intelligence

Microsoft doubles down on multi-model AI as it builds a Copilot super app

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required