Gemma 2-2B-it

mentions 1 type Person feed RSS

// recent coverage 1 mentions

04:00

2026-06-25

arxiv.org

large-language-models

Perfect Detection, Failed Control: The Geometry of Knowing vs. Steering in Language Models

Researchers at arXiv found that in language models, the direction that best detects a behavior and the one that best controls it are often geometrically distinct, with cosine similarities as low as 0.…

// co-occurs with top 1 entities

arXiv 1