04:39
2026-06-24
lesswrong.com
large-language-models
Reasoning and learning about injected concepts in language models
Researchers at SPAR argue that language models can report on their own internal activations, a method called self-report, which is underexploited compared to external analysis tools. They test five opโฆ