{"slug": "want-better-synthetic-data-steer-it-activation-steering-for-low-resource", "title": "Want Better Synthetic Data? Steer It: Activation Steering for Low-Resource Language Generation", "summary": "Researchers propose activation steering as an alternative to few-shot prompting for generating synthetic data in low-resource languages. The method improves data diversity and downstream model performance by steering language and quality representations in early layers of LLMs.", "body_md": "arXiv:2606.18389v1 Announce Type: new\nAbstract: Large language models (LLMs) have become an effective tool for synthetic data generation, including for low-resource languages, where generated data can improve downstream task performance. Current best-performing approaches typically rely on few-shot prompting with target-language examples, which increases inference costs and may reduce diversity through lexical anchoring. In this work, we investigate activation steering as an alternative for low-resource synthetic data generation. We study two steering strategies: Language Steering, which targets the linguistic identity of a language, and Quality Steering, which captures well-formedness by contrasting human-written and backtranslated text representations. We evaluate these methods across four open-source LLMs, multiple layers, and 11 typologically diverse languages by generating sentiment and topic classification data and finetuning smaller classifiers. Steering is applied in both zero-shot and few-shot prompting settings and compared against non-steered counterparts. Our results show that steering on early layers consistently improves the diversity of generated data while often yielding stronger downstream model performance, particularly for low-resource languages.", "url": "https://wpnews.pro/news/want-better-synthetic-data-steer-it-activation-steering-for-low-resource", "canonical_source": "https://arxiv.org/abs/2606.18389", "published_at": "2026-06-18 04:00:00+00:00", "updated_at": "2026-06-18 04:24:31.785868+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/want-better-synthetic-data-steer-it-activation-steering-for-low-resource", "markdown": "https://wpnews.pro/news/want-better-synthetic-data-steer-it-activation-steering-for-low-resource.md", "text": "https://wpnews.pro/news/want-better-synthetic-data-steer-it-activation-steering-for-low-resource.txt", "jsonld": "https://wpnews.pro/news/want-better-synthetic-data-steer-it-activation-steering-for-low-resource.jsonld"}}