19:28
2026-05-29
giovannigatti.github.io
ai-safety
CVE-Bench: testing LLM agents on real-world vulnerability patches
Researchers evaluated five frontier AI models (three from OpenAI, two from Poolside) on fixing 20 real-world Common Vulnerabilities and Exposures (CVEs) across three prompt conditions, finding that noโฆ