How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

wpnews.pro

cd /news/machine-learning/how-linear-is-a-transformer-feed-for… · home › topics › machine-learning › article

[ARTICLE · art-33563] src=arxiv.org ↗ pub=2026-06-19T04:00Z topic=machine-learning verified=true sentiment=· neutral

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

A new study measures the linearity of transformer feed-forward blocks, finding that linear recoverability (R²_lin) varies widely across blocks and is a learned property, not an architectural one. The analysis of GPT-2, Pythia-160m, and llama-160m shows that some blocks are nearly linear while others are strongly nonlinear, with implications for model compression and understanding transformer computation.

read1 min views1 publishedJun 19, 2026

arXiv:2606.19379v1 Announce Type: new Abstract: Transformer feed-forward networks (FFNs) are often treated as nonlinear stores of computation, yet how nonlinear a trained FFN block actually is has rarely been measured. We treat each FFN as a position-wise input-to-output map and split it into the exact least-squares linear approximation plus a residual. The held-out variance the closed-form linear map explains defines a block's linear recoverability (R^2_lin), an optimiser-free measure of its linearity. Across all twelve blocks of GPT-2, Pythia-160m, and llama-160m, R^2_lin is highly heterogeneous and non-monotone with depth, ranging from near-linear (>0.99) to strongly nonlinear (<0.3) between adjacent blocks, and is not set by the activation function: same-width GELU models GPT-2 and Pythia-160m have sharply different profiles, so recoverability is a learned property of individual trained blocks, not an architectural one. A low-rank bilinear probe of the residual recovers only a few points of R^2, with gain uncorrelated with residual nonlinearity: the unrecovered computation is not a single position-wise product but higher-order or distributed structure. The measurement also serves as a targeted compression signal: recoverable blocks admit large single-layer replacements (GPT-2's early FFN at 8x fewer parameters for +0.77 perplexity), while low-recoverability blocks flag where this is unsafe. It further exposes a methodological pitfall: trained linear baselines can badly under-converge on ill-conditioned transformer activations, so we report the exact closed-form least-squares ceiling throughout.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/how-linear-is-a-transfor…

Read original on arxiv.org → arxiv.org/abs/2606.19379

mentioned entities

GPT-2

Pythia-160m

llama-160m

arXiv

metadata

slughow-linear-is-a-transformer-feed-forward-block-per-block-linear-recoverability

topic#machine-learning

secondary3 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevNewegg deal drops RTX 5060 Ti 16…

next →Stop Saying "It Works on My Mach…

── more in #machine-learning 4 stories · sorted by recency

arxiv.org · 19 Jun · #machine-learning

Weibull Weight-Scale Parameter Evolution under AdamW Training Dynamics

arxiv.org · 19 Jun · #machine-learning

Physics-Informed Discovery of Yield Functions in Plasticity via Convex Neural Representations

arxiv.org · 19 Jun · #machine-learning

Exposing the Unsaid: Visualizing Hidden LLM Bias through Stochastic Path Aggregation

arxiv.org · 19 Jun · #machine-learning

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

── more on @gpt-2 3 stories trending now

wpnews · 18 Jun · #large-language-models

ICYMI: ZAI launches GLM-5.2 open model with 1M context

wpnews · 18 Jun · #ai-chips

Apple and Intel join forces in Trump’s push to bring chipmaking home

wpnews · 18 Jun · #ai-agents

How to Automate Business Reports With an AI Agent Instead of Dashboards

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required