cd /news/ai-safety/realm-a-unified-red-teaming-benchmar… · home topics ai-safety article
[ARTICLE · art-37206] src=arxiv.org ↗ pub= topic=ai-safety verified=true sentiment=· neutral

REALM: A Unified Red-Teaming Benchmark for Physical-World VLMs

Researchers introduced REALM, the first unified red-teaming benchmark for physical-world vision-language models (VLMs), integrating 12 attack methods, 3 defenses, and 13 VLMs under a black-box threat model. The benchmark enables fair comparison of diverse attacks by generating shared, physically grounded objectives, revealing that text and typographic injections cause the most failures and that model scale alone does not ensure robustness.

read1 min views2 publishedJun 24, 2026

arXiv:2606.23892v1 Announce Type: new Abstract: Vision-language models (VLMs) are increasingly used as perception-reasoning backbones for embodied intelligence in safety-critical physical systems, where perception or reasoning errors can lead to unsafe decisions or actions. Although many red-teaming methods have been developed to probe VLM vulnerabilities, their evaluation remains fragmented across datasets, metrics, and threat models, making direct comparison difficult and obscuring whether observed differences arise from stronger attacks, more vulnerable models, or incompatible evaluation settings. Existing chatbot-centric red-teaming benchmarks mainly standardize jailbreak and content-safety evaluation, but they do not systematically capture physically grounded functional failures or cover red-teaming methods that target physical-world VLMs. This raises the key challenge of comparing diverse attack methods under a unified protocol while targeting the same scenario-specific failures. We introduce REALM, to our knowledge the first unified red-teaming benchmark for physical-world VLMs. REALM integrates 12 red-teaming methods, 3 model-agnostic defenses, and 13 VLMs under a practical black-box threat model with shared datasets and metrics. To align adversarial objectives across attack families, REALM introduces an agentic target-generation pipeline that constructs shared, scenario-specific, and physically grounded attack objectives for each scene, enabling fair comparison of diverse red-teaming methods under aligned adversarial goals. Our evaluation shows that text and typographic injection attacks induce the most failures, multimodal co-optimization yields the strongest visual-perturbation transfer, single-pass attacks approach iterative methods at much lower cost, and model scale alone does not confer adversarial robustness. Code is available at https://github.com/UCF-ML-Research/REALM.

── more in #ai-safety 4 stories · sorted by recency
── more on @realm 3 stories trending now
sponsored brought to you by zahid.host 4,200+ EU-deployed projects
reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main
Live at https://your-agent.zahid.host
Get free account → Pricing
from €0/mo · no card required
LIVE [news/realm-a-unified-red-…] indexed:0 read:1min 2026-06-24 ·