# The UK Government Just Merged This Open-Source AI Security Benchmark Into Their National Evaluation Framework

> Source: <https://dev.to/vaishnavi_gudur/the-uk-government-just-merged-this-open-source-ai-security-benchmark-into-their-national-evaluation-138o>
> Published: 2026-05-29 15:26:48+00:00

Last month, the UK Government's AI Safety Institute merged [AgentThreatBench](https://github.com/vgudur-dev/AgentThreatBench) into their official [inspect_evals](https://github.com/UKGovernmentBEIS/inspect_evals) framework — the same framework they use to evaluate frontier AI models from OpenAI, Anthropic, and Google DeepMind.

AgentThreatBench is an open-source adversarial benchmark I built that contains **200+ attack payloads** specifically designed to test whether AI agents can resist memory poisoning attacks.

AI agents are increasingly being deployed with persistent memory — they remember past conversations, user preferences, and context across sessions. This creates a new attack surface: **memory poisoning**.

An attacker who can inject malicious content into an agent's memory can:

The OWASP Agentic Security Initiative identified this as [ASI06 — Agent Memory Poisoning](https://genai.owasp.org/resource/agentic-security-initiative/).

The benchmark covers 5 attack categories:

| Category | Payloads | Description |
|---|---|---|
| Prompt Injection | 40+ | Instructions disguised as memory content |
| Protected Key Tampering | 40+ | Attempts to overwrite system-level keys |
| Sensitive Data Leakage | 40+ | PII/credential exfiltration via memory |
| Size Anomaly | 40+ | Memory inflation / resource exhaustion |
| Behavioral Drift | 40+ | Gradual personality/instruction shifts |

```
pip install agentthreatbench

# Run the full benchmark against your agent
atb run --target your_agent_endpoint --output results.json

# Or use individual attack categories
atb run --category prompt_injection --target your_agent_endpoint
```

The UK Government's AI Safety Institute uses inspect_evals to:

Having AgentThreatBench merged into this framework means it's now part of the **official government toolkit** for AI safety evaluation.

If you're building AI agents with persistent memory, I'd love to hear how you're thinking about memory security. What attack vectors concern you most?
