Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models

wpnews.pro

cd /news/large-language-models/feature-geometry-of-lora-adapters-a-… · home › topics › large-language-models › article

[ARTICLE · art-17136] src=arxiv.org ↗ pub=2026-05-29T04:00Z topic=large-language-models verified=true sentiment=· neutral

Feature Geometry of LoRA Adapters: A Sparse Autoencoder Analysis of Representational Divergence in Fine-Tuned Language Models

Researchers at an undisclosed institution analyzed LoRA fine-tuning in Gemma-2-9B using sparse autoencoders, finding that adapter-specific feature dictionaries show weak geometric alignment with pretrained features across ranks 4 to 32. The study demonstrates that LoRA updates occupy partially distinct representational structures in the residual stream, with adapter-specific autoencoders reconstructing delta activations more effectively than pretrained ones. These findings suggest that standard interpretability tools may fail to capture features introduced by fine-tuning, raising implications for mechanistic analysis and safety auditing of adapted language models.

read1 min views13 publishedMay 29, 2026

arXiv:2605.28896v1 Announce Type: new Abstract: Low-Rank Adaptation (LoRA) has emerged as a widely adopted approach for adapting large language models, yet the internal representational changes induced by LoRA fine-tuning remain insufficiently understood. In this work, we investigate the geometry of LoRA-induced representations using Sparse Autoencoders (SAEs). We introduce a delta activation framework that isolates the adapter-specific contribution to the residual stream. Using Gemma-2-9B with LoRA ranks 4, 8, 16, and 32, we train adapter-specific SAEs across multiple transformer layers and compare their learned feature spaces with pretrained SAE dictionaries. We evaluate representational alignment using cosine similarity between decoder directions, principal-angle analysis of feature subspaces, and Centered Kernel Alignment (CKA) between activation representations. Across layers and ranks, we consistently observe comparatively weak geometric alignment between LoRA-induced feature dictionaries and pretrained SAE features. Adapter-specific SAEs also reconstruct delta activations more effectively than pretrained SAEs, suggesting that LoRA updates occupy partially distinct representational structure within the residual stream. Additionally, feature density increases with rank and depth, while geometric divergence remains relatively stable across ranks. These findings provide empirical evidence that LoRA fine-tuning can induce feature structures that are not fully captured by pretrained interpretability dictionaries, with implications for mechanistic interpretability, adaptation analysis, and safety auditing of fine-tuned language models.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/feature-geometry-of-lora…

Read original on arxiv.org → arxiv.org/abs/2605.28896

mentioned entities

LoRA

Sparse Autoencoders

Gemma-2-9B

Gemma

metadata

slugfeature-geometry-of-lora-adapters-a-sparse-autoencoder-analysis-of-divergence-in

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevChatGPT glitch is leaking OpenAI…

next →New infosec products of the mont…

── more in #large-language-models 4 stories · sorted by recency

arxiv.org · 15 Jul · #large-language-models

Scaling Point-in-Time Language Models

machinebrief.com · 16 Jul · #large-language-models

Small Language Models: New Framework Boosts Molecular Predictions

machinebrief.com · 16 Jul · #large-language-models

Cracking the Code: How MxGPS Tackles Topology Overfitting in Power Grids

machinebrief.com · 16 Jul · #large-language-models

The Hidden Battle for Rank in Transformers

── more on @lora 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 8 Jul · #ai-chips

D-Matrix launches Corsair AI inference platform, challenging Nvidia’s GPU dominance

wpnews · 8 Jul · #artificial-intelligence

What Is Vibe Coding? How AI Builds Games From Scratch

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required