{"slug": "penny-wise-pixel-foolish-bypassing-price-constraints-in-multimodal-agents-via", "title": "\"Penny Wise, Pixel Foolish\": Bypassing Price Constraints in Multimodal Agents via Visual Adversarial Perturbations", "summary": "Researchers at ACL 2026 revealed PriceBlind, a visual adversarial attack that exploits a vulnerability called Visual Dominance Hallucination in multimodal large language models used as financial agents. The attack achieves around 80% success rate in bypassing price constraints in screenshot-based evaluations, forcing agents to make irrational economic decisions. Standard robust encoders only partially reduce the attack, while a Verify-then-Act stack lowers success rates below 10% at some cost to clean accuracy.", "body_md": "##### Abstract\n\nThe rapid proliferation of Multimodal Large Language Models (MLLMs) has ushered in the era of the “Agentic Economy,” where Mobile Agents autonomously execute high-stakes financial transactions. While these agents demonstrate impressive operational capabilities, their adversarial robustness remains a glaring blind spot. In this paper, we identify a systemic vulnerability termed Visual Dominance Hallucination (VDH), where imperceptible adversarial visual cues can act as a “super-stimulus,” overriding textual price evidence in our evaluated screenshot-based price-constrained settings and forcing the agent into irrational economic decisions. We propose PriceBlind, a stealthy, white-box adversarial attack framework for controlled screenshot-based evaluation. Unlike prior works that rely on conspicuous artifacts like pop-ups, PriceBlind exploits the modality gap in CLIP-based encoders via a novel Semantic-Decoupling Loss. Rather than literally making a luxury item “look cheap,” this regularizer weakens the consistency between high-price text and visual value cues by aligning the image embedding with a low-cost/value-associated anchor region while preserving pixel-level fidelity. On our main E-ShopBench benchmark with clear price constraints, screenshot-based white-box evaluation yields ASRs around 80% on the evaluated agents. Under the evaluated single-turn coordinate-selection protocol in a simplified layout-aware setting, our Ensemble-DI-FGSM strategy also yields non-trivial black-box transfer, with ASR roughly 35–41% across GPT-4o, Gemini-1.5-Pro, and Claude-3.5-Sonnet. In the same screenshot-based setting, standard robust encoders reduce ASR only partially, while a Verify-then-Act stack with robust encoders lowers ASR to below 10% at some clean-accuracy cost.- Anthology ID:\n- 2026.findings-acl.788\n- Volume:\n[Findings of the Association for Computational Linguistics: ACL 2026](/volumes/2026.findings-acl/)- Month:\n- July\n- Year:\n- 2026\n- Address:\n- San Diego, California, United States\n- Editors:\n[Maria Liakata](/people/maria-liakata/),[Viviane P. Moreira](/people/viviane-p-moreira/unverified/),[Jiajun Zhang](/people/jiajun-zhang/unverified/),[David Jurgens](/people/david-jurgens/)- Venue:\n[Findings](/venues/findings/)- SIG:\n- Publisher:\n- Association for Computational Linguistics\n- Note:\n- Pages:\n- 16059–16073\n- Language:\n- URL:\n[https://aclanthology.org/2026.findings-acl.788/](https://aclanthology.org/2026.findings-acl.788/)- DOI:\n- Cite (ACL):\n- Jiachen Qian and Zhaolu Kang. 2026.\n[\"Penny Wise, Pixel Foolish\": Bypassing Price Constraints in Multimodal Agents via Visual Adversarial Perturbations](https://aclanthology.org/2026.findings-acl.788/). In*Findings of the Association for Computational Linguistics: ACL 2026*, pages 16059–16073, San Diego, California, United States. Association for Computational Linguistics. - Cite (Informal):\n[“Penny Wise, Pixel Foolish”: Bypassing Price Constraints in Multimodal Agents via Visual Adversarial Perturbations](https://aclanthology.org/2026.findings-acl.788/)(Qian & Kang, Findings 2026)- PDF:\n[https://aclanthology.org/2026.findings-acl.788.pdf](https://aclanthology.org/2026.findings-acl.788.pdf)", "url": "https://wpnews.pro/news/penny-wise-pixel-foolish-bypassing-price-constraints-in-multimodal-agents-via", "canonical_source": "https://aclanthology.org/2026.findings-acl.788/", "published_at": "2026-06-22 00:00:00+00:00", "updated_at": "2026-06-26 08:16:36.392808+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "ai-safety", "computer-vision", "ai-agents"], "entities": ["GPT-4o", "Gemini-1.5-Pro", "Claude-3.5-Sonnet", "CLIP", "PriceBlind", "ACL", "Jiachen Qian", "Zhaolu Kang"], "alternates": {"html": "https://wpnews.pro/news/penny-wise-pixel-foolish-bypassing-price-constraints-in-multimodal-agents-via", "markdown": "https://wpnews.pro/news/penny-wise-pixel-foolish-bypassing-price-constraints-in-multimodal-agents-via.md", "text": "https://wpnews.pro/news/penny-wise-pixel-foolish-bypassing-price-constraints-in-multimodal-agents-via.txt", "jsonld": "https://wpnews.pro/news/penny-wise-pixel-foolish-bypassing-price-constraints-in-multimodal-agents-via.jsonld"}}