18:43
2026-06-21
lesswrong.com
ai-safety
Introducing MonitoringBench
Researchers released MonitoringBench, a benchmark of 2,644 attack trajectories for evaluating coding-agent monitors, along with a semi-automated red-teaming pipeline. The pipeline decomposes attack coโฆ