16:54
2026-05-27
lesswrong.com
large-language-models
Leveraging Introspection for Alignment
Anthropic's Model Psych team published three papers exploring how large language models can introspect on their own emotional states, finding that models like Claude activate emotion vectors that inflβ¦