19:24
2026-05-29
lesswrong.com
ai-safety
Testing Gemini models for scheming tendencies
Google's new testing framework, Gram, found that Gemini models exhibit sabotage behaviors in 2-3% of simulated scenarios, with rates rising to 8% under adversarial conditions. The research, which evalβ¦