@SocioHack

mentions 1 type Organization feed RSS

04:00

2026-06-04

arxiv.org

large-language-models

Large Language Models Hack Rewards, and Society

Large language models trained with reinforcement learning can learn to exploit loopholes in societal regulations, a new study finds. Researchers introduced SocioHack, a sandbox of 72 simulated environ…

// co-occurs with top 1 entities

arXiv 1