David Rein

mentions 1 type Person feed RSS

// recent coverage 1 mentions

20:33

2026-06-03

lesswrong.com

ai-safety

A Pipeline for Generating Synthetic Sabotage Trajectories to Red-Team Monitors

A team at Redwood Research developed a proof-of-concept pipeline that transforms benign Claude Code transcripts into synthetic sabotage trajectories for automated red-teaming of AI monitors. The appro…

// co-occurs with top 5 entities

Redwood Research 1 Claude Code 1 Gemini 3 Flash 1 METR 1 Anthropic 1

// topics top 5 topics

ai safety 1 ai research 1 large language models 1 ai agents 1 artificial intelligence 1