{"slug": "the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their", "title": "The UK Government Just Merged This Open-Source AI Security Benchmark Into Their National Evaluation Framework", "summary": "The UK Government's AI Safety Institute has merged the open-source AgentThreatBench benchmark into its official inspect_evals framework, which is used to evaluate frontier AI models from OpenAI, Anthropic, and Google DeepMind. AgentThreatBench contains over 200 attack payloads designed to test AI agents' resistance to memory poisoning attacks, covering five categories including prompt injection and sensitive data leakage. The benchmark is now part of the government's standard toolkit for AI safety evaluation.", "body_md": "Last month, the UK Government's AI Safety Institute merged [AgentThreatBench](https://github.com/vgudur-dev/AgentThreatBench) into their official [inspect_evals](https://github.com/UKGovernmentBEIS/inspect_evals) framework — the same framework they use to evaluate frontier AI models from OpenAI, Anthropic, and Google DeepMind.\n\nAgentThreatBench is an open-source adversarial benchmark I built that contains **200+ attack payloads** specifically designed to test whether AI agents can resist memory poisoning attacks.\n\nAI agents are increasingly being deployed with persistent memory — they remember past conversations, user preferences, and context across sessions. This creates a new attack surface: **memory poisoning**.\n\nAn attacker who can inject malicious content into an agent's memory can:\n\nThe OWASP Agentic Security Initiative identified this as [ASI06 — Agent Memory Poisoning](https://genai.owasp.org/resource/agentic-security-initiative/).\n\nThe benchmark covers 5 attack categories:\n\n| Category | Payloads | Description |\n|---|---|---|\n| Prompt Injection | 40+ | Instructions disguised as memory content |\n| Protected Key Tampering | 40+ | Attempts to overwrite system-level keys |\n| Sensitive Data Leakage | 40+ | PII/credential exfiltration via memory |\n| Size Anomaly | 40+ | Memory inflation / resource exhaustion |\n| Behavioral Drift | 40+ | Gradual personality/instruction shifts |\n\n```\npip install agentthreatbench\n\n# Run the full benchmark against your agent\natb run --target your_agent_endpoint --output results.json\n\n# Or use individual attack categories\natb run --category prompt_injection --target your_agent_endpoint\n```\n\nThe UK Government's AI Safety Institute uses inspect_evals to:\n\nHaving AgentThreatBench merged into this framework means it's now part of the **official government toolkit** for AI safety evaluation.\n\nIf you're building AI agents with persistent memory, I'd love to hear how you're thinking about memory security. What attack vectors concern you most?", "url": "https://wpnews.pro/news/the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their", "canonical_source": "https://dev.to/vaishnavi_gudur/the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their-national-evaluation-138o", "published_at": "2026-05-29 15:26:48+00:00", "updated_at": "2026-05-29 15:42:48.081688+00:00", "lang": "en", "topics": ["ai-safety", "ai-agents", "ai-policy", "ai-research", "large-language-models"], "entities": ["UK Government", "AI Safety Institute", "AgentThreatBench", "inspect_evals", "OpenAI", "Anthropic", "Google DeepMind", "OWASP Agentic Security Initiative"], "alternates": {"html": "https://wpnews.pro/news/the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their", "markdown": "https://wpnews.pro/news/the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their.md", "text": "https://wpnews.pro/news/the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their.txt", "jsonld": "https://wpnews.pro/news/the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their.jsonld"}}