STATEWITNESS

mentions 1 type Organization feed RSS

// recent coverage 1 mentions

04:00

2026-06-17

arxiv.org

large-language-models

Decoding Hidden Deception in Reasoning LLMs: Activation Explainers for Deception Auditing

Researchers introduced STATEWITNESS, an activation explainer that decodes hidden states from reasoning LLMs to audit deceptive behavior, achieving 0.916 mean AUROC and outperforming existing monitors …

// co-occurs with top 1 entities

arXiv 1

// topics top 4 topics

large language models 1 ai safety 1 ai ethics 1 ai research 1