04:00
2026-06-25
arxiv.org
large-language-models
Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models
Researchers at arXiv found that in language models, the direction that best detects a behavior and the one that best controls it are often geometrically distinct, with cosine similarities as low as 0.โฆ