{"slug": "multilingual-steering-by-design-multilingual-sparse-autoencoders-and-principled", "title": "Multilingual Steering by Design: Multilingual Sparse Autoencoders and Principled Layer Selection", "summary": "Researchers have developed a principled method for multilingual language steering in large language models using sparse autoencoders (SAEs), addressing the unreliability of existing English-only SAE approaches. By training SAEs on multilingual data and introducing a layer-selection rule based on the intersection of multilingual alignment and language separability, the team achieved more reliable language control across models like LLaMA-3.1-8B and Gemma-2-9B. The approach stabilizes the trade-off between language identification accuracy and generation quality, offering a predictive framework for multilingual SAE steering without exhaustive layer searches.", "body_md": "arXiv:2605.23036v1 Announce Type: new\nAbstract: Sparse autoencoders (SAEs) enable feature-level mechanistic interpretability and activation steering in large language models (LLMs), but SAE-based language control remains unreliable in multilingual settings: most SAEs are trained on English-only data, and steering layers are chosen heuristically. We address these limitations by advancing a principled, mechanistic account of multilingual language steering with SAEs. First, we show that training SAEs on multilingual data consistently strengthens cross-lingual representations and yields more reliable, quality-preserving language control across layers and model families. Second, we introduce an \\emph{a priori} steering layer-selection rule based on the intersection of multilingual alignment and language separability, which predicts effective intervention depths without exhaustive layerwise search. We evaluate our approach on LLaMA-3.1-8B and Gemma-2-9B across machine translation and cross-lingual summarization (CrossSumm), using SpBLEU, ROUGE-L, COMET, and LaSE. Our results show that multilingual SAEs combined with intersection-selected layers stabilize the trade-off between language identification accuracy and generation quality, providing a principled, predictive, representation-level account of multilingual SAE steering.", "url": "https://wpnews.pro/news/multilingual-steering-by-design-multilingual-sparse-autoencoders-and-principled", "canonical_source": "https://arxiv.org/abs/2605.23036", "published_at": "2026-05-25 04:00:00+00:00", "updated_at": "2026-05-25 15:26:28.988701+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "machine-learning", "artificial-intelligence", "ai-research"], "entities": ["LLaMA-3.1-8B", "Gemma-2-9B", "CrossSumm", "SpBLEU", "ROUGE-L", "COMET", "LaSE"], "alternates": {"html": "https://wpnews.pro/news/multilingual-steering-by-design-multilingual-sparse-autoencoders-and-principled", "markdown": "https://wpnews.pro/news/multilingual-steering-by-design-multilingual-sparse-autoencoders-and-principled.md", "text": "https://wpnews.pro/news/multilingual-steering-by-design-multilingual-sparse-autoencoders-and-principled.txt", "jsonld": "https://wpnews.pro/news/multilingual-steering-by-design-multilingual-sparse-autoencoders-and-principled.jsonld"}}