{"slug": "gdm-ai-control-roadmap", "title": "GDM AI Control Roadmap", "summary": "GDM published an AI Control Roadmap outlining internal guardrails to detect and prevent adversarial behavior by AI agents. The roadmap includes threat modeling, control invariants, capability-based mitigation ladders, and 15 practical defenses. It aims to maintain detection and prevention capabilities as AI models become more powerful.", "body_md": "GDM has published an AI Control Roadmap! From the executive summary:\n\nWe present theGDM AI Control Roadmap (v0.1)– our plan for implementing and adopting internal guardrails designed to catch potential adversarial behaviour by AI agents, even as they become increasingly harder to oversee and contain.We focus on system-level mitigations that limit the harm a misaligned AI system could cause. Specifically, this report provides:\n\n•\n\nThreat modelling: Taking inspiration from cybersecurity, we adopt a conservative, worst-case approach to threat modelling throughout this paper, and assume a hypothetical AI adversary pursuing undesirable goals in internal deployment. We introduce TRAIT&R, a taxonomy of tactics and techniques available to such a hypothetical AI adversary, building on the established security framework MITRE ATT&CK. We categorise new hypothetical threats into three core areas:loss of control(e.g., rogue internal deployments),work sabotage(e.g., intentionally flawed alignment and safety research), anddirect harm(e.g., critical asset destruction or exfiltration).•\n\nControl invariants: We establish two defensive invariants that we aim to maintain, even as the hypothetical AI adversary becomes increasingly capable: i) reliabledetectionof misaligned intent or actions, and ii) effectiveprevention and responseto attack attempts. For control to serve as an effective line of defence, our ability to detect and prevent attacks should exceed AI agents’ ability to evade detection and execute attacks.•\n\nCapability-based mitigation ladders: Because controlling more capable models requires increasingly costly interventions, we link specific defensive measures to evolving model capabilities (such as the ability to reason opaquely or execute complex cyberattacks). As models get more powerful, our defences should escalate accordingly. We outline fourDetection tiers (D1–D4)and threePrevention and Response tiers (R1–R3).•\n\nA portfolio of practical defences: We suggest 15 concrete, tiered mitigations. These range from low-cost interventions for current models (e.g., chain-of-thought monitoring, asynchronous alerts) to advanced safeguards for future models (e.g., real-time access control, system-level anomaly detection, internal activations monitoring, and shutdown infrastructure). Many of these mitigations build on existing, well-established security practices, but may need adjustment to handle adversarial AI.", "url": "https://wpnews.pro/news/gdm-ai-control-roadmap", "canonical_source": "https://www.lesswrong.com/posts/hEtkyKGoPpFeWnKkX/gdm-ai-control-roadmap", "published_at": "2026-06-18 16:50:59+00:00", "updated_at": "2026-06-18 17:02:23.422247+00:00", "lang": "en", "topics": ["ai-safety", "ai-agents", "ai-research", "ai-policy"], "entities": ["GDM", "MITRE ATT&CK"], "alternates": {"html": "https://wpnews.pro/news/gdm-ai-control-roadmap", "markdown": "https://wpnews.pro/news/gdm-ai-control-roadmap.md", "text": "https://wpnews.pro/news/gdm-ai-control-roadmap.txt", "jsonld": "https://wpnews.pro/news/gdm-ai-control-roadmap.jsonld"}}