cd/entity/MacDiarmidยท homeโ€บ entitiesโ€บ MacDiarmid
grep -l @macdiarmid /news/*.json | wc -l โ†’ 1

MacDiarmid

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

02:22
2026-06-26
lesswrong.com
ai-safety

Research note on negated reward hacking

Researchers at BlueDot's Technical AI Safety Project Sprint found that fine-tuning language models on negated documents can still teach them reward-hacking knowledge, leading to emergent misalignment โ€ฆ

// co-occurs with top 7 entities