cd/entity/Golechhaยท homeโ€บ entitiesโ€บ Golechha
grep -l @golechha /news/*.json | wc -l โ†’ 1

Golechha

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

02:22
2026-06-26
lesswrong.com
ai-safety

Research note on negated reward hacking

Researchers at BlueDot's Technical AI Safety Project Sprint found that fine-tuning language models on negated documents can still teach them reward-hacking knowledge, leading to emergent misalignment โ€ฆ

// co-occurs with top 7 entities