Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

wpnews.pro

cd /news/large-language-models/mechanistic-origins-of-catastrophic-… · home › topics › large-language-models › article

[ARTICLE · art-17127] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=large-language-models verified=true sentiment=· neutral

Mechanistic origins of catastrophic forgetting: why RL preserves circuits better than SFT?

Researchers at Qwen2.5-3B-Instruct found that supervised fine-tuning (SFT) causes greater disruption to internal computational circuits and more catastrophic forgetting than reinforcement learning (RL) when adapting large language models to scientific question-answering. The team introduced a head-level measure called differential circuit vulnerability to compare the two methods, revealing a mechanistic trade-off where SFT adapts faster but degrades circuits more severely. These findings suggest that RL's superior preservation of base circuits may explain its greater robustness against catastrophic forgetting.

read1 min views11 publishedMay 29, 2026

arXiv:2605.28860v1 Announce Type: new Abstract: Fine-tuning large language models (LLMs) frequently induces catastrophic forgetting of prior capabilities. Recent work has shown that reinforcement learning (RL) retains prior capabilities more effectively than supervised fine-tuning (SFT), attributing this to policy-gradient updates remaining closer to the base policy \cite{shenfeld2025rl}. We extend this behavioral account to the mechanistic level and ask whether RL's advantage is mirrored by stronger preservation of internal computational circuits. We introduce differential circuit vulnerability, a head-level measure of how much a circuit degrades under fine-tuning, and use it to compare RL and SFT on Qwen2.5-3B-Instruct adapted to scientific question-answering. We find a clear mechanistic trade-off: SFT adapts more rapidly to the target task but produces substantially greater circuit disruption and forgetting of prior capabilities, whereas RL preserves a larger fraction of the base circuit at the cost of slower task adaptation. These findings suggest that circuit preservation may help explain why RL is more robust to catastrophic forgetting. We released our code here: https://github.com/rl-sft-circuit-research/differential-circuit-vulnerability.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/mechanistic-origins-of-c…

Read original on arxiv.org → arxiv.org/abs/2605.28860

mentioned entities

Qwen2.5-3B-Instruct

arXiv

metadata

slugmechanistic-origins-of-catastrophic-forgetting-why-rl-preserves-circuits-better

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevChatGPT glitch is leaking OpenAI…

next →New infosec products of the mont…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 16 Jul · #large-language-models

Automatic Differentiation from Scratch: How PyTorch Computes Gradients in Physics-Informed Neural Networks

machinebrief.com · 16 Jul · #large-language-models

Operator Approximation: A New Theorem Challenges the Norm

machinebrief.com · 16 Jul · #large-language-models

Nature: Can AI Truly Master Scientific Discovery?

machinebrief.com · 16 Jul · #large-language-models

Why OPINE-World Could Be the Future of AI Adaptability

── more on @qwen2.5-3b-instruct 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

wpnews · 8 Jul · #artificial-intelligence

What Is Vibe Coding? How AI Builds Games From Scratch

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required