02:32
2026-06-15
lesswrong.com
machine-learning
Do k-Sparse Autoencoders Reveal Thinking Patterns? Interpretable Features in a Small Reasoning Model
A research project using k-sparse autoencoders found interpretable features in a small reasoning model, including features that appear to correspond to the model's reasoning process. The experiment anβ¦