03:45
2026-05-26
gist.github.com
ai-safety
Show HN: AgentToolBench-Code โ security benchmark for AI coding agents
A developer expanded their AI-agent security benchmark from 10 to 16 scenarios, revealing that Claude Code Sonnet 4.6 scores +9 out of 16 while Haiku 4.5 scores only +3. The original tie between the tโฆ