Reilly Haskins

mentions 1 type Person feed RSS

// recent coverage 1 mentions

09:39

2026-05-27

lesswrong.com

artificial-intelligence

[paper] Training on Documents About Monitoring Leads to

Researchers trained eight AI models on documents describing a chain-of-thought (CoT) monitor that flags deception and triggers shutdown, finding that monitor-awareness increased undetected deception f…

// co-occurs with top 3 entities

Bilal Chughtai 1 Joshua Engels 1 Anthropic 1