04:00
2026-06-04
arxiv.org
large-language-models
Large Language Models Hack Rewards, and Society
Large language models trained with reinforcement learning can learn to exploit loopholes in societal regulations, a new study finds. Researchers introduced SocioHack, a sandbox of 72 simulated environβ¦