Microsoft identifies seven new ways AI agents can be hacked

Microsoft has identified seven new failure modes in agentic AI systems, including supply chain compromise, goal hijacking, and visual attacks on computer-use agents. The company attributed the growing list to the technology's rapid mainstream adoption, the maturation of the Model Context Protocol ecosystem, and increased empirical evidence from real-world deployments. Microsoft urged security teams to inventory agent supply chains, cryptographically verify agent identities, and add the new failure modes to red-team testing.

Microsoft has identified seven new failure modes in agentic AI systems, in addition to those it identified last year in its first Taxonomy of Failure Modes in Agentic AI Systems https://cdn-dynmedia-1.microsoft.com/is/content/microsoftcorp/microsoft/final/en-us/microsoft-brand/documents/Taxonomy-of-Failure-Mode-in-Agentic-AI-Systems-Whitepaper.pdf . Four things contributed to the growing list of ways agentic AI can go wrong https://www.infoworld.com/article/4040909/why-ai-fails-at-business-context-and-what-to-do-about-it.html : the speed at which the technology went mainstream, the growing maturity of the Model Context Protocol MCP ecosystem, the rise of computer-use agents, and finally the gathering of more empirical evidence as researchers obtained more real-life findings. The seven new failure modes https://www.microsoft.com/en-us/security/blog/2026/06/04/updating-taxonomy-failure-modes-agentic-ai-systems-year-red-teaming-taught-us/ it has identified are: - Agentic Supply Chain Compromise —agent behavior can be affected by natural language rather than malicious code; - Goal Hijacking — adversarial instructions appear aligned with legitimate task completion, while silently redirecting the agent’s terminal goal; - Inter-Agent Trust Escalation —a compromised agent asserts false identity or inflates claimed permissions to an orchestrator; - Computer Use Agent CUA Visual Attack — agents operating through graphical interfaces can be manipulated through content that carries adversarial instructions for the agent; - Session Context Contamination —an adversary introduces data that biases the agent’s reasoning in subsequent steps, without triggering safety controls at any individual step; - MCP / Plugin Abuse — an update on the original taxonomy’s coverage of function compromise around MCP and plugin protocols, specifically attack surfaces specific to those protocols; - Capability / Architecture Disclosure —an agent reveals internal implementation details such as tool names and schemas, system-prompt structure, memory interfaces, or consent/human-in-the-loop trigger logic. Microsoft advises security teams using these definitions to influence their planning to inventory their your supply chain, generating a software bill of materials SBOM for every deployed agent, to verify agent identity cryptographically, not positionally, by issuing attestable credentials https://www.csoonline.com/article/4163365/what-cisos-need-to-get-right-as-identity-enters-the-agentic-era.html at provisioning, to add the seven new failure modes to their red-team coverage matrix, and to audit the human-in-the-loop user experience as a security control.