04:00
2026-06-12
arxiv.org
ai-safety
"Did you lie?" Evaluating Lie Detectors across Model Scale and Belief-Verified Model Organisms
Researchers have developed new testbeds to evaluate lie detectors for language models, finding that existing detection methods often fail when models are trained to hold false beliefs. The study testeβ¦