Scoring AI hackers when there is no answer key

AI models are saturating existing offensive-cyber benchmarks, which often rely on bugs with public writeups, limiting their ability to differentiate top systems. The AI security lab Irregular introduced FrontierCyber, a benchmark designed to score AI hackers without a fixed answer key, addressing the problem of benchmarks becoming obsolete once models solve most tests.

AI models are solving more and more of the offensive-cyber tests built to measure them. Once a model solves most of a benchmark, that benchmark runs out of room and says little about the best systems anymore. Many of those tests also lean on bugs that already have public writeups, so a strong score can come partly from a model repeating something it has read. FrontierCyber, a benchmark from the AI security lab Irregular, goes … More https://www.helpnetsecurity.com/2026/06/25/ai-offensive-cyber-evaluations-benchmark/ The post Scoring AI hackers when there is no answer key https://www.helpnetsecurity.com/2026/06/25/ai-offensive-cyber-evaluations-benchmark/ appeared first on Help Net Security https://www.helpnetsecurity.com .