{"slug": "brain-llm-alignment-tracks-training-data-not-typology", "title": "Brain-LLM Alignment Tracks Training Data, Not Typology", "summary": "A study of fMRI data from 112 English, Chinese, and French speakers found that brain-LLM alignment is driven by training-language dominance, not an inherent property of English. A Chinese-dominant model reversed the alignment gradient entirely, matching Chinese brains best and English worst, while formal typological distance independently degraded alignment, particularly in syntax-associated brain regions. The findings reveal that the apparent \"English advantage\" is an artifact of training data composition, with remaining variation reflecting genuine typological structure in syntactic processing.", "body_md": "arXiv:2605.23032v1 Announce Type: new\nAbstract: Brain-LLM alignment is well established in English, yet the brain's language network is neuroanatomically universal across languages. Does alignment also generalize cross-linguistically, and what governs the variation? We test this using fMRI data from 112 participants across English, Chinese, and French (the Le Petit Prince corpus) and seven LLMs spanning English-dominant, Chinese-dominant, and multilingual architectures. Our central finding is that training-language dominance, not an inherent property of English, drives the alignment pattern: a Chinese-dominant model (Baichuan2-7B), architecture-matched to LLaMA-2-7B, reverses the gradient entirely, aligning best with Chinese brains and worst with English. Beyond training dominance, formal typological distance independently covaries with alignment degradation, syntax-associated brain regions (IFG) show $2.3\\times$ steeper typological gradients than lexico-semantic regions (PTL), and tokenization fertility accounts for $\\sim$60% of a cross-linguistic shift in optimal encoding layer. These results reveal that the apparent \"English advantage\" in brain-LLM alignment is an artifact of training data composition, while the remaining variation reflects genuine typological structure concentrated in syntactic processing.", "url": "https://wpnews.pro/news/brain-llm-alignment-tracks-training-data-not-typology", "canonical_source": "https://arxiv.org/abs/2605.23032", "published_at": "2026-05-25 04:00:00+00:00", "updated_at": "2026-05-25 15:26:15.300393+00:00", "lang": "en", "topics": ["large-language-models", "natural-language-processing", "ai-research", "neural-networks", "machine-learning"], "entities": ["Baichuan2-7B", "LLaMA-2-7B", "Le Petit Prince corpus", "English", "Chinese", "French"], "alternates": {"html": "https://wpnews.pro/news/brain-llm-alignment-tracks-training-data-not-typology", "markdown": "https://wpnews.pro/news/brain-llm-alignment-tracks-training-data-not-typology.md", "text": "https://wpnews.pro/news/brain-llm-alignment-tracks-training-data-not-typology.txt", "jsonld": "https://wpnews.pro/news/brain-llm-alignment-tracks-training-data-not-typology.jsonld"}}