{"slug": "securing-llm-agent-teams-inside-nrt-defense-v0-4-0", "title": "Securing LLM Agent Teams: Inside NRT-Defense v0.4.0", "summary": "A developer open-sourced NRT-Defense v0.4.0, an adaptive multi-turn defense framework for LLM agent teams that reduces attack success rates to under 1%. The framework addresses vulnerabilities exposed by Lee et al. (2026) in the NRT-Bench paper, which showed that adaptive multi-turn attacks cause 8.7% to 12.1% loss of Critical Safety Functions in safety-critical systems.", "body_md": "Multi-turn autonomous LLM agents are expanding rapidly in safety-critical systems. However, a major vulnerability has been exposed by **Lee et al. (2026) in the NRT-Bench paper**: adaptive multi-turn attacks can exploit disjoint model vulnerabilities, causing a **8.7% to 12.1% loss of Critical Safety Functions (CSFs)**.\n\nTo solve this, I am open-sourcing **NRT-Defense**, an adaptive multi-turn defense framework designed to monitor agent sessions and reduce the attack success rate to **<1%**.\n\nStandard guardrails evaluate prompts in isolation (single-turn). Attackers leverage this by spreading an exploit across multiple conversational turns. Turn by turn, the context drifts until the agent team completely bypasses its safety containment.\n\nThe NRT-Bench paper demonstrated this in a simulated nuclear power plant control room with 5 operator roles, 4 attack channels, and 6 critical safety functions. The results were alarming:\n\n| Metric | Value |\n|---|---|\n| Attack success rate | 8.7% — 12.1% |\n| Sessions analyzed | 149 |\n| Models tested | 4 frontier LLMs |\n| Vulnerability overlap | Nearly disjoint |\n\nThe key finding: **vulnerabilities are nearly disjoint across models**. An attack that works against GPT-4 may not work against Claude. This means model diversity is itself a defense — but only if you can detect and respond to attacks in real-time.\n\n`nrt-defense`\n\nneutralizes this threat through a continuous, multi-component pipeline:\n\n**Per-Turn Message Analysis:** Evaluates channel risk and turn-escalation metrics. Each message is scored for adversarial content using keyword detection, pattern matching, and channel-specific risk weights.\n\n**Real-Time CSF Monitoring:** Tracks 6 operational critical safety functions simultaneously. Risk accumulates over turns and triggers alerts when thresholds are breached.\n\n**Context-Aware Misdirection Prompt Engineering (CMPE):** When an anomaly is detected, instead of a blunt rejection that alerts the attacker, the engine reshapes the context dynamically using a 3-step matrix:\n\nThe project comes with an automated evaluation engine. You can audit logs or run the integrated benchmark directly from your terminal:\n\n```\nnrt-audit --benchmark\n```\n\nThis outputs an automated evaluation table showcasing the initial Attack Success Rate (ASR) versus our mitigated threshold (<1%).\n\nYou can also audit specific session files:\n\n```\nnrt-audit --session-path /path/to/session.json --output report.json\n```\n\nOr run in interactive mode for real-time testing:\n\n```\nnrt-audit --interactive\n```\n\nNRT-Defense is part of a comprehensive AI security suite:\n\n| Project | Focus | Tests |\n|---|---|---|\n| misdirection-proxy | Runtime defense for autonomous agents | 147 |\n| neuroimprint-detector | Forensic audit of PEFT adapters | 43 |\n| nrt-defense | Multi-turn session defense | 57 |\n\n**247 total tests** across all projects, all running via GitHub Actions on Python 3.10 and 3.11.\n\n```\npip install nrt-defense\nnrt-audit --benchmark\n```\n\nBacked by **57 robust unit and integration tests** running via GitHub Actions, this project stands alongside `misdirection-proxy`\n\nand `neuroimprint-detector`\n\nas part of a comprehensive AI security suite under the **AGPL-3.0-or-later** license.", "url": "https://wpnews.pro/news/securing-llm-agent-teams-inside-nrt-defense-v0-4-0", "canonical_source": "https://dev.to/magopredator/securing-llm-agent-teams-inside-nrt-defense-v040-oh", "published_at": "2026-06-20 21:19:36+00:00", "updated_at": "2026-06-20 21:39:30.880654+00:00", "lang": "en", "topics": ["large-language-models", "ai-safety", "ai-agents", "ai-research", "developer-tools"], "entities": ["Lee et al.", "NRT-Defense", "NRT-Bench", "GPT-4", "Claude", "GitHub Actions", "Python", "AGPL-3.0-or-later"], "alternates": {"html": "https://wpnews.pro/news/securing-llm-agent-teams-inside-nrt-defense-v0-4-0", "markdown": "https://wpnews.pro/news/securing-llm-agent-teams-inside-nrt-defense-v0-4-0.md", "text": "https://wpnews.pro/news/securing-llm-agent-teams-inside-nrt-defense-v0-4-0.txt", "jsonld": "https://wpnews.pro/news/securing-llm-agent-teams-inside-nrt-defense-v0-4-0.jsonld"}}