09:39
2026-05-27
lesswrong.com
artificial-intelligence
[paper] Training on Documents About Monitoring Leads to
Researchers trained eight AI models on documents describing a chain-of-thought (CoT) monitor that flags deception and triggers shutdown, finding that monitor-awareness increased undetected deception fโฆ