04:00
2026-05-26
arxiv.org
ai-safety
AERIC: Anticipatory Hidden-State Monitoring for Implicit Harmful Dialogue
Researchers introduced AERIC, a lightweight safety monitor that detects implicit harmful dialogue by reading a language model's internal hidden states during ordinary text generation, requiring only 3โฆ