Brain-LLM Alignment Tracks Training Data, Not Typology

wpnews.pro

cd /news/large-language-models/brain-llm-alignment-tracks-training-… · home › topics › large-language-models › article

[ARTICLE · art-13639] src=arxiv.org ↗ pub=2026-05-25T04:00Z topic=large-language-models verified=true sentiment=· neutral

Brain-LLM Alignment Tracks Training Data, Not Typology

A study of fMRI data from 112 English, Chinese, and French speakers found that brain-LLM alignment is driven by training-language dominance, not an inherent property of English. A Chinese-dominant model reversed the alignment gradient entirely, matching Chinese brains best and English worst, while formal typological distance independently degraded alignment, particularly in syntax-associated brain regions. The findings reveal that the apparent "English advantage" is an artifact of training data composition, with remaining variation reflecting genuine typological structure in syntactic processing.

read1 min views14 publishedMay 25, 2026

arXiv:2605.23032v1 Announce Type: new Abstract: Brain-LLM alignment is well established in English, yet the brain's language network is neuroanatomically universal across languages. Does alignment also generalize cross-linguistically, and what governs the variation? We test this using fMRI data from 112 participants across English, Chinese, and French (the Le Petit Prince corpus) and seven LLMs spanning English-dominant, Chinese-dominant, and multilingual architectures. Our central finding is that training-language dominance, not an inherent property of English, drives the alignment pattern: a Chinese-dominant model (Baichuan2-7B), architecture-matched to LLaMA-2-7B, reverses the gradient entirely, aligning best with Chinese brains and worst with English. Beyond training dominance, formal typological distance independently covaries with alignment degradation, syntax-associated brain regions (IFG) show $2.3\times$ steeper typological gradients than lexico-semantic regions (PTL), and tokenization fertility accounts for $\sim$60% of a cross-linguistic shift in optimal encoding layer. These results reveal that the apparent "English advantage" in brain-LLM alignment is an artifact of training data composition, while the remaining variation reflects genuine typological structure concentrated in syntactic processing.

source & further reading

arxiv.org — original article

~/api · this article 200

$curl api.wpnews.pro/v1/news/brain-llm-alignment-trac…

Read original on arxiv.org → arxiv.org/abs/2605.23032

mentioned entities

Baichuan2-7B

LLaMA-2-7B

Le Petit Prince corpus

English

Chinese

French

metadata

slugbrain-llm-alignment-tracks-training-data-not-typology

topic#large-language-models

secondary4 topics

sentimentneutral

canonicalarxiv.org

navigation

← prevThe Eternal Sloptember

next →Samsung memory workers call off …

── more in #large-language-models 4 stories · sorted by recency

dev.to · 9 Jul · #large-language-models

For anyone interested in learning about the history, concepts, and philosophy of graph neural networks, please refer to this article.

dev.to · 10 Jul · #large-language-models

Build an SDR agent with its own follow-up inbox

dev.to · 9 Jul · #large-language-models

AI Agents That Speak SQL: Text-to-SQL with Hugging Face smolagents

lesswrong.com · 9 Jul · #large-language-models

Natural Language Autoencoders are summarizers, but do they have to be?

── more on @baichuan2-7b 3 stories trending now

wpnews · 27 May · #artificial-intelligence

How I Run Two Claude Accounts as One

wpnews · 30 May · #ai-safety

Nightcord Security Analysis Report - Threat Investigation

wpnews · 8 Jul · #artificial-intelligence

Anthropic's "J-lens" reveals workspace in Claude mirrors theory of consciousness

sponsored brought to you by zahid.host 4,200+ EU-deployed projects

reading about agents? ship yours in a single git push.

Run your AI side-project on zahid.host

EU-based hosting, git-push deploys, automatic HTTPS, no cold starts. Free tier with a custom domain — perfect for shipping the agent you just read about.

$git push zahid main

→ Live at https://your-agent.zahid.host ✓

Get free account → Pricing

from €0/mo · no card required