cd/entity/McNemarยท homeโ€บ entitiesโ€บ McNemar
grep -l @mcnemar /news/*.json | wc -l โ†’ 1

@McNemar

mentions 1 type Organization feed RSS
19:28
2026-05-29
giovannigatti.github.io
ai-safety

CVE-Bench: testing LLM agents on real-world vulnerability patches

Researchers evaluated five frontier AI models (three from OpenAI, two from Poolside) on fixing 20 real-world Common Vulnerabilities and Exposures (CVEs) across three prompt conditions, finding that noโ€ฆ

// co-occurs with top 5 entities