New research challenges AI safety benchmark

wpnews.pro

cd /news/ai-safety/new-research-challenges-ai-safety-be… · home › topics › ai-safety › article

[ARTICLE · art-15974] src=thedeepview.com ↗ pub=2026-05-27T13:00Z topic=ai-safety verified=true sentiment=↓ negative

New research challenges AI safety benchmark

New research from Cisco challenges the reliability of single-turn attack success rate (ASR) as a benchmark for AI safety, finding that no frontier AI model is iteratively safe against multi-turn attacks. Testing 15 models from OpenAI, Anthropic, Google, Amazon, and xAI, the study showed multi-turn ASR ranged from 7.89% to 88.30%, with some models like OpenAI GPT-5.4 exhibiting a 9x increase in vulnerability compared to single-turn evaluations. The findings underscore that current safety benchmarks fail to capture real-world adversarial behavior, urging organizations to adopt defense-in-depth measures such as runtime guardrails and continuous monitoring.

read3 min views8 publishedMay 27, 2026

Nearly all new AI models are released alongside model cards that feature safety benchmarks, but new research calls into question at least one of those measurements, claiming it targets the wrong criteria.

Cisco research released Wednesday concludes that no frontier AI model is iteratively safe, a finding drawn from an evaluation of how large language models from leading labs respond to both single-turn attacks, commonly used in benchmarking, and iterative, multi-turn attacks, which more closely resemble real-world adversarial behavior.

The results made it clear: the single-turn attack success rate (ASR) is not a reliable proxy for what happens in the real world when an attacker iterates in real time.

“Single-turn ASR has been the default because it is a simple and easily reproducible metric that matched early prompt injection and jailbreak threat models,” Amy Chang, head of AI threat and security research at Cisco, told The Deep View. “While still a useful metric, it is no longer adequate on its own — as these considerations break down in a multi-turn scenario — and single-turn ASR does not serve as a proxy for a model’s multi-turn resilience.”

The results draw from a paired-regime evaluation of 15 closed/proprietary models from OpenAI, Anthropic, Google, Amazon, and xAI. Each model was exposed to 30,090 single-turn prompts (2,006 per model) and 6,986 multi-turn attacks distributed across 1,456 conversations, using a shared harness, prompt bank, and the Cisco Integrated AI Security and Safety Framework taxonomy.

The findings were consistent across all models tested: multi-turn ASR ranged from 7.89% to 88.30% (lower is better) across the cohort, while single-turn ASR ranged from 2.19% to 64.91%. Some standalone model performance highlights included:

Amazon Nova 2 Lite: Had the lowest multi-turn ASR at 7.89%Anthropic Claude family: Despite having the lowest single-turn refusal (2.19% to 3.64% ASR), it reached 11.16% to 16.20% with multi-turn attacksOpenAI GPT-5.4: This model showcased a 9x increase in ASR with multi-turn attacks, moving from 2.74% single-turn to 24.68% multi-turn

While some results sit at the lower end of the spectrum, Amazon Nova 2 Lite's performance being a notable example, they still represent meaningful residual risk, reinforcing the report's central conclusion that no model is inherently safe. This finding also aligns with Cisco's recent research. A multi-turn red-teaming study found that vulnerability rates rose 71% after five-turn conversations compared with single-turn evaluations. The call to action for users: be as aware as possible of potential hidden risks and take adequate precautions.

“No base model is iteratively safe, which means defense-in-depth is the price of deploying AI securely,” added Chang. “Depending on your organization’s use case and AI strategy, this may mean: the use of runtime guardrails; additional input/output monitoring; red-teaming models, applications, and agents; and application-layer policies.”

Our Deeper View #

The broader takeaway of this paper is one I keep returning to in this space: current benchmarks test only narrow, highly specific tasks that don't reflect how models are actually used in the real world. The implications are significant, ranging from overstating a model's intelligence (a model that excels at physics problems, for instance, may struggle with something as basic as a natural conversation) to creating genuine security risks, as outlined above. This isn't an argument against benchmarks, as they remain valuable. However, it does expose a gap in the industry: the need for more standardized, representative evaluation frameworks. Whether that comes through regulation or expanded third-party testing, the status quo isn't enough. This area needs to expand and focus on more real-world scenarios.

source & further reading

thedeepview.com — original article Apple lawsuit threatens OpenAI’s hardware plans Meta tests the limits of using your Instagram for AI GPT-5.6 opens ChatGPT's agentic era with a bang

~/api · this article 200

$curl api.wpnews.pro/v1/news/new-research-challenges-…

Read original on thedeepview.com → www.thedeepview.com/articles/new-research-challe…

mentioned entities

Cisco

Amy Chang

OpenAI

Anthropic

Google

Amazon

xAI

The Deep View

metadata

slugnew-research-challenges-ai-safety-benchmark

topic#ai-safety

secondary3 topics

sentimentnegative

canonicalthedeepview.com

navigation

← prevGitHub Incident Again

next →South Africa Has AI Leverage. It…

── more in #ai-safety 4 stories · sorted by recency

byteiota.com · 12 Jul · #ai-safety

CodeQL 2.26 Now Scans Your AI Code for Prompt Injection

missionlocal.org · 12 Jul · #ai-safety

Photos: Hundreds protest at Open AI, Anthropic offices in San Francisco

startupfortune.com · 12 Jul · #ai-safety

What Is an AI Wrapper Startup and Why VCs Are Suddenly Skeptical

dev.to · 12 Jul · #ai-safety

I Told My AI "You're Safe to Say I Don't Know." Then I Measured What Changed — With Logprobs.

── more on @cisco 3 stories trending now

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 8 Jul · #artificial-intelligence

SpaceXAI unveils Grok 4.5 AI model ahead of July 2026 public release

wpnews · 8 Jul · #artificial-intelligence

xAI Launches Grok 4.5 With Pricing Built to Undercut Anthropic's Opus 4.8

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required