{"slug": "pruning-via-causal-attribution-preserves-reasoning-performance-in-large-language", "title": "Pruning via Causal Attribution Preserves Reasoning Performance in Large Language Models", "summary": "Researchers introduced Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads in large language models by measuring their causal impact on reasoning tasks. CAP achieved up to 61% relative accuracy gains over Wanda on ARC-Challenge at 20% sparsity, preserving reasoning performance better than correlational pruning criteria. The method was evaluated on Llama-3-8B-Instruct and Mistral-7B-Instruct across GSM8K, StrategyQA, and ARC-Challenge at various sparsity levels.", "body_md": "arXiv:2606.19350v1 Announce Type: new\nAbstract: Large language models (LLMs) excel at multi-step reasoning but incur substantial inference cost. We introduce Causal Attribution Pruning (CAP), a training-free method that identifies critical attention heads by measuring their causal impact on reasoning tasks and uses these head-level scores to guide fine-grained weight pruning. For each attention head, CAP estimates the expected performance degradation when the head is masked during forward passes on a small calibration set of reasoning problems. These causal scores are then converted into weight-level importance values for the corresponding projection matrices. Unlike magnitude-only or activation-based criteria, CAP's interventional measurement directly captures each head's functional contribution, yielding relative accuracy gains of up to 61% over Wanda on ARC-Challenge at 20% sparsity. We evaluate CAP on GSM8K, StrategyQA, and ARC-Challenge using Llama-3-8B-Instruct and Mistral-7B-Instruct at 10%, 20%, and 50% sparsity. At moderate sparsity (10-20%), CAP improves over Wanda in most model-benchmark configurations. with especially large gains on ARC-Challenge for Llama-3. Our results suggest that attention-head-level causal attribution can better preserve reasoning performance on downstream benchmarks than correlational pruning criteria at equivalent sparsity, while remaining limited by coarse MLP attribution at 50% sparsity.", "url": "https://wpnews.pro/news/pruning-via-causal-attribution-preserves-reasoning-performance-in-large-language", "canonical_source": "https://arxiv.org/abs/2606.19350", "published_at": "2026-06-19 04:00:00+00:00", "updated_at": "2026-06-19 04:05:03.831237+00:00", "lang": "en", "topics": ["large-language-models", "ai-research", "ai-products"], "entities": ["Causal Attribution Pruning", "Wanda", "Llama-3-8B-Instruct", "Mistral-7B-Instruct", "GSM8K", "StrategyQA", "ARC-Challenge"], "alternates": {"html": "https://wpnews.pro/news/pruning-via-causal-attribution-preserves-reasoning-performance-in-large-language", "markdown": "https://wpnews.pro/news/pruning-via-causal-attribution-preserves-reasoning-performance-in-large-language.md", "text": "https://wpnews.pro/news/pruning-via-causal-attribution-preserves-reasoning-performance-in-large-language.txt", "jsonld": "https://wpnews.pro/news/pruning-via-causal-attribution-preserves-reasoning-performance-in-large-language.jsonld"}}