Euan Ong

mentions 1 type Person feed RSS

// recent coverage 1 mentions

12:55

2026-06-15

transformer-circuits.pub

large-language-models

Natural Language Autoencoders Produce Explanations of LLM Activations

Anthropic researchers introduced Natural Language Autoencoders (NLAs), an unsupervised method that generates natural language explanations of LLM activations by training two modules to reconstruct act…

// co-occurs with top 7 entities

Anthropic 1 Claude Opus 4.6 1 Claude Haiku 3.5 1 Haiku 4.5 1 Kit Fraser-Taliente 1 Subhash Kantamneni 1 Dan Mossing 1

// topics top 4 topics

large language models 1 ai safety 1 ai research 1 natural language processing 1