20:33
2026-06-03
lesswrong.com
ai-safety
A Pipeline for Generating Synthetic Sabotage Trajectories to Red-Team Monitors
A team at Redwood Research developed a proof-of-concept pipeline that transforms benign Claude Code transcripts into synthetic sabotage trajectories for automated red-teaming of AI monitors. The approβ¦