MALT

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

17:39

2026-06-29

github.com

ai-safety

Show HN: AST-guard A gradient-immune structural guard against RL reward hacking

A developer released AST-guard, an open-source tool that uses deterministic abstract syntax tree analysis to detect reward hacking in AI-generated code, achieving 96.2% recall on a benchmark of reward…

// co-occurs with top 7 entities

AST-guard 1 Anthropic 1 DeepMind 1 METR 1 School of Reward Hacks 1 Countdown-Code 1 Khan et al. 1