Flash-WAM: Modality-Aware Distillation for World Action Models

wpnews.pro

cd /news/artificial-intelligence/flash-wam-modality-aware-distillatio… · home › topics › artificial-intelligence › article

[ARTICLE · art-22205] src=arxiv.org ↗ pub=2026-06-05T04:00Z topic=artificial-intelligence verified=true sentiment=↑ positive

Flash-WAM: Modality-Aware Distillation for World Action Models

Researchers introduced Flash-WAM, a modality-aware distillation framework that compresses world-action model inference to a single step, reducing per-chunk latency from 8.1 seconds to 348 milliseconds on NVIDIA L40S hardware. The method addresses the challenge of joint video-action generation by applying different consistency functions to each modality's distinct noise schedule. Flash-WAM preserves task success rates of 85.5% on RoboTwin 2.0 and 95.7% on LIBERO benchmarks while enabling real-time control, compared to naive consistency distillation which dropped to 24% success at the same step budget.

read1 min views15 publishedJun 5, 2026

arXiv:2606.05254v1 Announce Type: new Abstract: World-action models (WAMs) jointly generate future video and robot actions through iterative diffusion, achieving strong performance on manipulation benchmarks but requiring tens of denoising steps, a cost that precludes real-time control. Step distillation has emerged as the natural remedy, but off-the-shelf methods break down in the joint video-action setting because video and action streams use different SNR-shifted noise schedules and reach training with substantially different marginal noise distributions, an asymmetry that single-modality distillation methods cannot accommodate. We introduce \textbf{Flash-WAM}, a modality-aware step-distillation framework inspired by consistency distillation that selects the consistency function for each modality to match its noise regime: a linear-gradient-scaling parametrization for the action stream's low-noise regime, paired with a variance-preserving parametrization for the video stream's high-noise regime, grounded in a structural analysis of the consistency-function family that characterizes the achievable gradient scaling under the consistency boundary condition. Instantiated on LingBot-VA, Flash-WAM compresses inference to a single step in each modality. On RoboTwin 2.0, this reduces per-chunk latency from $8.1$ seconds to $348$ ms on NVIDIA L40S, a $23{\times}$ speedup that enables real-time inference. Flash-WAM preserves task success on simulation benchmarks ($85.5%$ RoboTwin 2.0, $95.7%$ LIBERO) and substantially recovers real-world performance ($60%$ average on a Unitree G1 humanoid robot), while naive consistency distillation drops to $24%$ at the same step budget.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/flash-wam-modality-aware…

Read original on arxiv.org → arxiv.org/abs/2606.05254

mentioned entities

Flash-WAM

LingBot-VA

RoboTwin 2.0

NVIDIA L40S

metadata

slugflash-wam-modality-aware-distillation-for-world-action-models

topic#artificial-intelligence

secondary4 topics

sentimentpositive

canonicalarxiv.org

navigation

← prevThe Arms Dealer’s Nintendo 64 Wa…

next →New infosec products of the week…

── more in #artificial-intelligence 4 stories · sorted by recency

businessinsider.com · 22 Jul · #artificial-intelligence

This 16-year-old built an autonomous AI-powered robot turtle that finds microplastics in water

gizmodo.com · 22 Jul · #artificial-intelligence

Samsung Let Me Touch Its Warby Parker x Gentle Monster Smart Glasses, but Not Wear Them

blog.roboflow.com · 22 Jul · #artificial-intelligence

Flanges Quality Inspection with Computer Vision

iggt4d.github.io · 22 Jul · #artificial-intelligence

IGGT4D: Streaming 4D Instance-Grounded Geometry Transformer

── more on @flash-wam 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-tools

What's the Future of Clay?

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required