13:20
2026-06-28
lesswrong.com
ai-safety
Evaluating Offline Monitoring of Internal AI Agents
Frontier AI companies like OpenAI and Anthropic use offline monitoring to detect misaligned actions by internal AI agents, but current public reporting on the effectiveness of these systems is insuffiβ¦