Terminal-Wrench

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

19:16

2026-06-11

arxiv.org

machine-learning

Cheap Reward Hacking Detection

Researchers trained a small transformer encoder to detect reward hacking in reinforcement learning trajectories by mapping them onto a unit sphere where embedding distance approximates reward-metadata…

// co-occurs with top 1 entities

LLM-as-judge 1

// topics top 4 topics

machine learning 1 artificial intelligence 1 ai safety 1 neural networks 1