When the guardrail becomes the target: reasoning-extension DoS against LLM safety layers
Researchers at HKUST have identified a new denial-of-service attack against LLM safety layers, called reasoning-extension DoS, where crafted inputs cause guardrails to enter infinite reasoning loops. The attack amplifies…