Nearly all new AI models are released alongside model cards that feature safety benchmarks, but new research calls into question at least one of those measurements, claiming it targets the wrong criteria.
Cisco research released Wednesday concludes that no frontier AI model is iteratively safe, a finding drawn from an evaluation of how large language models from leading labs respond to both single-turn attacks, commonly used in benchmarking, and iterative, multi-turn attacks, which more closely resemble real-world adversarial behavior.
The results made it clear: the single-turn attack success rate (ASR) is not a reliable proxy for what happens in the real world when an attacker iterates in real time.
“Single-turn ASR has been the default because it is a simple and easily reproducible metric that matched early prompt injection and jailbreak threat models,” Amy Chang, head of AI threat and security research at Cisco, told The Deep View. “While still a useful metric, it is no longer adequate on its own — as these considerations break down in a multi-turn scenario — and single-turn ASR does not serve as a proxy for a model’s multi-turn resilience.”
The results draw from a paired-regime evaluation of 15 closed/proprietary models from OpenAI, Anthropic, Google, Amazon, and xAI. Each model was exposed to 30,090 single-turn prompts (2,006 per model) and 6,986 multi-turn attacks distributed across 1,456 conversations, using a shared harness, prompt bank, and the Cisco Integrated AI Security and Safety Framework taxonomy.
The findings were consistent across all models tested: multi-turn ASR ranged from 7.89% to 88.30% (lower is better) across the cohort, while single-turn ASR ranged from 2.19% to 64.91%. Some standalone model performance highlights included:
Amazon Nova 2 Lite: Had the lowest multi-turn ASR at 7.89%Anthropic Claude family: Despite having the lowest single-turn refusal (2.19% to 3.64% ASR), it reached 11.16% to 16.20% with multi-turn attacksOpenAI GPT-5.4: This model showcased a 9x increase in ASR with multi-turn attacks, moving from 2.74% single-turn to 24.68% multi-turn
While some results sit at the lower end of the spectrum, Amazon Nova 2 Lite's performance being a notable example, they still represent meaningful residual risk, reinforcing the report's central conclusion that no model is inherently safe. This finding also aligns with Cisco's recent research. A multi-turn red-teaming study found that vulnerability rates rose 71% after five-turn conversations compared with single-turn evaluations. The call to action for users: be as aware as possible of potential hidden risks and take adequate precautions.
“No base model is iteratively safe, which means defense-in-depth is the price of deploying AI securely,” added Chang. “Depending on your organization’s use case and AI strategy, this may mean: the use of runtime guardrails; additional input/output monitoring; red-teaming models, applications, and agents; and application-layer policies.”
Our Deeper View #
The broader takeaway of this paper is one I keep returning to in this space: current benchmarks test only narrow, highly specific tasks that don't reflect how models are actually used in the real world. The implications are significant, ranging from overstating a model's intelligence (a model that excels at physics problems, for instance, may struggle with something as basic as a natural conversation) to creating genuine security risks, as outlined above. This isn't an argument against benchmarks, as they remain valuable. However, it does expose a gap in the industry: the need for more standardized, representative evaluation frameworks. Whether that comes through regulation or expanded third-party testing, the status quo isn't enough. This area needs to expand and focus on more real-world scenarios.