CVE-Bench

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

19:28

2026-05-29

giovannigatti.github.io

ai-safety

CVE-Bench: testing LLM agents on real-world vulnerability patches

Researchers evaluated five frontier AI models (three from OpenAI, two from Poolside) on fixing 20 real-world Common Vulnerabilities and Exposures (CVEs) across three prompt conditions, finding that no…

// co-occurs with top 5 entities

OpenAI 1 Poolside 1 Anthropic 1 Mythos 1 McNemar 1