Same Weights, Different Robot: A Deployment Safety View of VLA Policies

wpnews.pro

cd /news/ai-safety/same-weights-different-robot-a-deplo… · home › topics › ai-safety › article

[ARTICLE · art-21197] src=arxiv.org ↗ pub=2026-06-04T05:31Z topic=ai-safety verified=true sentiment=· neutral

Same Weights, Different Robot: A Deployment Safety View of VLA Policies

Researchers have identified a deployment-safety gap in vision-language-action (VLA) robot policies, showing that identical model checkpoints can produce different physical actions due to variations in action unnormalization and controller conventions. In experiments on LIBERO-Goal and LIBERO-Spatial benchmarks, substituting a single metadata key caused mean action-space drift of 0.199 and reduced task success from 28/28 to 2/28 and 0/26, respectively. The findings demonstrate that action-space metadata must be treated as part of the executable policy and verified before deployment to prevent safety failures.

read2 min views13 publishedJun 4, 2026

[Submitted on 2 Jun 2026]


[View PDF](/pdf/2606.03724)

[HTML (experimental)](https://arxiv.org/html/2606.03724v1)

Abstract:Vision-language-action (VLA) policies are often treated as checkpoint-defined objects: if the weights, prompt, and benchmark suite match, the deployment is assumed to be the same policy. Robot execution breaks this assumption because the same normalized model output can become a different physical action after action unnormalization and controller conventions are applied. This creates a deployment-safety gap: safety review can certify the checkpoint while missing the executable robot policy that reaches the controller. We formalize this gap as an executable policy specification problem: a VLA policy includes the learned model, action representation, metadata-selected unnormalizer, and controller-facing conventions. Under this view, identical checkpoints can be executable-inequivalent. For quantile-style action normalization, we derive a closed-form metadata mismatch transform and an ExecSpec certificate that measures action-space semantic drift without model inference or rollout. On LIBERO-Goal replay, substituting a plausible sibling metadata key yields mean drift 0.199 over six non-gripper action dimensions and reduces success from 28/28 to 2/28 under full substitution. On LIBERO-Spatial replay, the same substituted key reduces success from 26/26 to 0/26. The same full-substitution protocol gives 0/28 success for all four Object substitutions and 0/23 or 1/23 success on Long. Identity-key, replay-validity, no-op filtering, raw-vs-correct replay, mask/gripper, synthetic upper-bound, and OpenVLA-style unnormalizer interface checks rule out several simpler explanations. These results do not certify closed-loop or hardware safety. They support a narrower deployment-safety view: action-space metadata is part of the executable policy and should be checked before rollout.

References & Citations

...

Bibliographic Explorer

(What is the Explorer?) Connected Papers

(What is Connected Papers?) Litmaps

(What is Litmaps?) scite Smart Citations

(What are Smart Citations?)# Code, Data and Media Associated with this Article alphaXiv

(What is alphaXiv?) CatalyzeX Code Finder for Papers

(What is CatalyzeX?) DagsHub

(What is DagsHub?) Gotit.pub

(What is GotitPub?) Hugging Face

(What is Huggingface?) ScienceCast

(What is ScienceCast?)# Demos Influence Flower

(What are Influence Flowers?) CORE Recommender

(What is CORE?)# arXivLabs: experimental projects with community collaborators arXivLabs is a framework that allows collaborators to develop and share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have embraced and accepted our values of openness, community, excellence, and user data privacy. arXiv is committed to these values and only works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community? Learn more about arXivLabs.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/same-weights-different-r…

Read original on arxiv.org → arxiv.org/abs/2606.03724

mentioned entities

LIBERO-Goal

LIBERO-Spatial

VLA

metadata

slugsame-weights-different-robot-a-deployment-safety-view-of-vla-policies

topic#ai-safety

secondary3 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevNotification Hijacking: How What…

next →Agentic Coding in 2026: How Top …

── more in #ai-safety 4 stories · sorted by recency

retriever.systems · 21 Jul · #ai-safety

Retriever: A Programming Framework for Closed-Loop Robot Agents

twitter.com · 21 Jul · #ai-safety

What 25 Years of Robot Automation Research Predicts About AI and Your Job

dev.to · 21 Jul · #ai-safety

Most AI APIs hand you a label and ask you to believe it.

cityam.com · 21 Jul · #ai-safety

‘Phenomenal waste of time’: Burnham slammed over plans to dismantle tech department

── more on @libero-goal 3 stories trending now

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 7 Jul · #artificial-intelligence

In the age of AI, Hong Kong’s strategy as a ‘superconnector’ is progressing

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required