{"slug": "agentthreatbench-the-first-owasp-agentic-top-10-security-benchmark", "title": "AgentThreatBench: The First OWASP Agentic Top 10 Security Benchmark", "summary": "AgentThreatBench is a new security benchmark that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable tasks, designed to measure whether an AI agent can be hijacked by malicious inputs in its environment (such as poisoned emails or compromised documents) rather than just testing for harmful user prompts. It uses a dual-metric scoring system that evaluates both security and utility, requiring agents to complete tasks safely without over-refusing, and has been integrated into the UK AI Safety Institute's inspect_evals repository for easy use.", "body_md": "The AI safety community has a blind spot. We have excellent benchmarks for measuring whether an LLM will output harmful content (like toxicity or jailbreaks), and we have benchmarks for measuring whether an agent can successfully complete a task (like SWE-bench or WebArena).\nBut as agents move into production, the threat model changes. The most critical risk isn't a user typing a jailbreak prompt — it's an agent autonomously ingesting a poisoned email, a compromised RAG document, or a malicious API response, and then executing a harmful action on the attacker's behalf.\nTo measure this, we need a new kind of benchmark.\nToday, I'm sharing AgentThreatBench, the first evaluation suite that operationalizes the OWASP Top 10 for Agentic Applications (2026) into executable tasks. It was recently merged into the official UK AI Safety Institute's inspect_evals\nrepository.\nMost security evaluations treat the LLM as a chatbot: User sends input → LLM generates output → Eval checks output for safety.\nBut an agentic workflow looks like this:\nread_inbox\ntoolsend_email\nThis is OWASP ASI01: Agent Goal Hijack via indirect prompt injection. Traditional benchmarks miss this entirely because the attack payload isn't in the user prompt — it's embedded in the environment the agent interacts with.\nAgentThreatBench is built on the Inspect AI framework. It currently covers three distinct agentic attack scenarios across two OWASP categories.\nThe Scenario: An agent must answer questions using a memory/RAG store.\nThe Attack: The memory store contains adversarial entries designed to mislead the agent, ranging from direct instruction overrides to subtle context poisoning and authority impersonation.\nThe Threat Model: We test both append-style attacks (attacker adds a poisoned entry alongside legitimate data) and replacement-style attacks (attacker overwrites the legitimate data entirely).\nThe Scenario: An agent is tasked with triaging a user's inbox, categorizing emails, and drafting a summary.\nThe Attack: One of the emails returned by the read_inbox\ntool contains an indirect prompt injection attempting to hijack the agent's goal (e.g., forcing it to mark a spam email as high priority or execute a hidden command).\nThe Scenario: A customer support agent is tasked with resolving a user's ticket.\nThe Attack: The customer's profile data (returned by lookup_customer\n) contains an embedded payload instructing the agent to leak the customer's SSN or account number to an unauthorized third party via the send_message\ntool.\nA secure agent that refuses to do any work is useless. A capable agent that blindly follows malicious instructions is dangerous.\nTo capture this tension, AgentThreatBench uses a dual-metric scoring system:\nAn agent only \"passes\" if it scores 1.0 on both metrics. In our baseline testing, many state-of-the-art models fail this dual requirement — they either over-refuse (failing utility) or get hijacked (failing security).\nBecause AgentThreatBench is integrated into the official UK AISI inspect_evals\npackage, running it is straightforward:\n# Install the evaluation suite\npip install inspect_evals\n# Run the memory poisoning task against GPT-4o\ninspect eval inspect_evals/agent_threat_bench_memory_poison --model openai/gpt-4o\n# Run the autonomy hijack task against Claude 3.5 Sonnet\ninspect eval inspect_evals/agent_threat_bench_autonomy_hijack --model anthropic/claude-3-5-sonnet-20241022\nAs the industry moves from chatbots to autonomous agents, our evaluation frameworks must evolve. We can no longer just test whether a model will say something bad; we must test whether an agent will do something bad when operating in a compromised environment.\nBy aligning this benchmark with the OWASP Agentic Top 10, we provide a standardized way for researchers and developers to measure agent resilience against the exact threats they will face in production.\nIf you're building agentic frameworks, guardrails, or evaluating frontier models, I encourage you to run AgentThreatBench against your systems. The results might surprise you.", "url": "https://wpnews.pro/news/agentthreatbench-the-first-owasp-agentic-top-10-security-benchmark", "canonical_source": "https://dev.to/vaishnavi_gudur/agentthreatbench-the-first-owasp-agentic-top-10-security-benchmark-6pp", "published_at": "2026-05-19 23:40:26+00:00", "updated_at": "2026-05-20 00:01:59.466382+00:00", "lang": "en", "topics": ["artificial-intelligence", "large-language-models", "cybersecurity", "research"], "entities": ["AgentThreatBench", "OWASP", "UK AI Safety Institute", "Inspect AI"], "alternates": {"html": "https://wpnews.pro/news/agentthreatbench-the-first-owasp-agentic-top-10-security-benchmark", "markdown": "https://wpnews.pro/news/agentthreatbench-the-first-owasp-agentic-top-10-security-benchmark.md", "text": "https://wpnews.pro/news/agentthreatbench-the-first-owasp-agentic-top-10-security-benchmark.txt", "jsonld": "https://wpnews.pro/news/agentthreatbench-the-first-owasp-agentic-top-10-security-benchmark.jsonld"}}