PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

wpnews.pro

cd /news/ai-safety/past2harm-a-simple-adaptive-past-ten… · home › topics › ai-safety › article

[ARTICLE · art-16064] src=arxiv.org ↗ pub=2026-05-28T04:00Z topic=ai-safety verified=true sentiment=↓ negative

PAST2HARM: A Simple Adaptive Past Tense Attack for Jailbreaking Multimodal AI

A new jailbreak framework called PAST2HARM has achieved up to 100% attack success rates against state-of-the-art multimodal AI systems, including Gemini Nano, GPT Image 2, and SD XL, by exploiting past tense reformulations to bypass refusal training. The attack, which operates in a black-box, gradient-free setting, generated explicit sexual content, political disinformation, hate speech, and self-harm glorification across all tested models. The findings expose fundamental weaknesses in current multimodal safety safeguards and underscore the urgent need for stronger alignment training.

read1 min views14 publishedMay 28, 2026

arXiv:2605.27545v1 Announce Type: new Abstract: Jailbreak attacks on multimodal AI systems remain underexplored, even though unsafe image generation can have more severe consequences than unsafe text and current defenses are relatively immature. We introduce PAST2HARM, a simple yet effective adaptive jailbreak framework that bypasses refusal training in state of the art multimodal text to image models. Building on prior findings that past tense reformulations can evade safeguards, PAST2HARM systematically exploits this vulnerability in multimodal generative AI. We characterize the attack along two dimensions. First, breadth: through temporal deepening, the framework incrementally strengthens historical anchoring and archival cues, eroding refusal boundaries across models with varying alignment strength. Second, depth: via iterative escalation after initial compliance, we probe the upper bound of harmful generation, measuring severity using a scalar severity jailbreak metric evaluated by a language model acting as a judge. We find that mid conversation turns form peak vulnerability windows, where harmfulness increases before plateauing and eventually undergoing semantic inversion. We evaluate PAST2HARM on three models Gemini Nano Banana Pro, GPT Image 2, and SD XL achieving attack success rates of 83 percent, 67 percent, and 100 percent in a black box, gradient free setting. Adversarial prompts also transfer across models, with cross model success rates above 50 percent. The attack elicits diverse harmful outputs, including explicit sexual content, political disinformation, historical denial narratives, hate speech, and self harm glorification. We further release a curated benchmark of prompts, reformulations, and outputs as a resource for red teaming and alignment. Our results expose fundamental brittleness in current safeguards and highlight the need for stronger multimodal safety training.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/past2harm-a-simple-adapt…

Read original on arxiv.org → arxiv.org/abs/2605.27545

mentioned entities

Gemini Nano Banana Pro

GPT Image 2

SD XL

PAST2HARM

metadata

slugpast2harm-a-simple-adaptive-past-tense-attack-for-jailbreaking-multimodal-ai

topic#ai-safety

secondary4 topics

sentimentnegative

canonicalarxiv.org

navigation

← prevOpen House 2026 Day 1: real-time…

next →New poll points to possible Bece…

── more in #ai-safety 4 stories · sorted by recency

thenewstack.io · 16 Jul · #ai-safety

“There are no laws, only suggestions”: What AI agents do with your instructions

github.com · 16 Jul · #ai-safety

Show HN: Quatuor – Kick back and watch 4 agents LLM talk to each other (FOSS)

economist.com · 16 Jul · #ai-safety

How to make AI safe–and lessen dependence on America and China

voi.id · 16 Jul · #ai-safety

Korea Selatan Siapkan AI Berdaulat untuk Keamanan Siber, GPU Masih Kurang

── more on @gemini nano banana pro 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 26 May · #ai-agents

Think, Durable Objects, and the Real Shape of AI Applications

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required