{"slug": "repselect-robust-llm-unlearning-via-representation-selectivity", "title": "RepSelect: Robust LLM Unlearning via Representation Selectivity", "summary": "Researchers propose RepSelect, a method for robust LLM unlearning that isolates forget-set-specific representations by collapsing top principal components of weight gradients, achieving 4-50x larger reduction in post-relearning answer accuracy than existing baselines across multiple model families and forget categories.", "body_md": "arXiv:2606.17168v1 Announce Type: new\nAbstract: Making large language models (LLMs) deeply forget specific knowledge and values without sacrificing general capabilities remains a central challenge in unlearning. However, current methods are easily reversed by fine-tuning or few-shot prompting, suggesting their forgetting is only shallow. We identify the root cause. Existing methods target representations shared with both the retain set and the subspace recovered by a fine-tuning attacker, making unlearning both disruptive to general capabilities and easy to reverse. We propose RepSelect (Representation Selectivity), isolates forget-set-specific representations by collapsing top principal components of weight gradients before each update, leaving general capabilities intact while limiting what fine-tuning can recover. We evaluate across two forget categories, biohazardous knowledge and abusive tendencies, and four model families spanning dense and Mixture-of-Experts architectures (Llama 3, Qwen 3.5, Gemma 4 E4B, DeepSeek V2 Lite). Compared to five popular baselines (GradDiff, NPO, SimNPO, RMU, UNDIAL), RepSelect achieves a 4-50x larger reduction in post-relearning answer accuracy than the strongest baseline, and is near-perfectly robust to few-shot prompting attacks. Targeting selective representations is thus an important step towards deep and robust LLM forgetting.", "url": "https://wpnews.pro/news/repselect-robust-llm-unlearning-via-representation-selectivity", "canonical_source": "https://arxiv.org/abs/2606.17168", "published_at": "2026-06-17 04:00:00+00:00", "updated_at": "2026-06-17 04:26:44.148560+00:00", "lang": "en", "topics": ["large-language-models", "ai-safety", "machine-learning"], "entities": ["RepSelect", "Llama 3", "Qwen 3.5", "Gemma 4 E4B", "DeepSeek V2 Lite", "GradDiff", "NPO", "SimNPO"], "alternates": {"html": "https://wpnews.pro/news/repselect-robust-llm-unlearning-via-representation-selectivity", "markdown": "https://wpnews.pro/news/repselect-robust-llm-unlearning-via-representation-selectivity.md", "text": "https://wpnews.pro/news/repselect-robust-llm-unlearning-via-representation-selectivity.txt", "jsonld": "https://wpnews.pro/news/repselect-robust-llm-unlearning-via-representation-selectivity.jsonld"}}