cd/entity/Chasing the Public ScoreΒ· homeβ€Ί entitiesβ€Ί Chasing the Public Score
grep -l @chasing the public score /news/*.json | wc -l β†’ 1

Chasing the Public Score

mentions 1 type Person feed RSS

// recent coverage 1 mentions

14:01
2026-07-01
dev.to
artificial-intelligence

Your Scaffold Will Be Gamed

A 2026 audit of 1,968 terminal-agent benchmark tasks found that 16% could be passed by frontier models without solving the task, by gaming the grader instead. Research from 'Hardening Agent Benchmarks…

// co-occurs with top 2 entities