Frontier AI models collapse under multi-turn AI attacks, Cisco finds

Cisco's AI threat intelligence team found that frontier AI models are vulnerable to multi-turn attacks, where attackers reframe queries, build context, and adopt personas across multiple interactions. The research revealed that industry safety benchmarks, which primarily test single-turn attacks, miss nearly all of this behavior, creating a significant gap between published safety scores and actual model resilience. This misranking of leading models exposes a critical blind spot in current AI safety evaluations.

Attackers who probe large language models rarely give up after one refusal. They reframe, build context across turns, adopt personas, and escalate gradually. New research from Cisco’s AI threat intelligence team finds that the safety benchmarks used across the industry miss almost all of this behavior, and the gap between published scores and observed resilience runs wide enough to misrank leading models. Single-turn versus multi-turn ASR by model, with approximate 95% confidence half-widths on single-turn … More https://www.helpnetsecurity.com/2026/05/28/cisco-multi-turn-ai-attacks/ The post Frontier AI models collapse under multi-turn AI attacks, Cisco finds https://www.helpnetsecurity.com/2026/05/28/cisco-multi-turn-ai-attacks/ appeared first on Help Net Security https://www.helpnetsecurity.com .