{"slug": "modular-cognitive-architecture-emerges-in-large-language-models", "title": "Modular Cognitive Architecture Emerges in Large Language Models", "summary": "Researchers at MIT found that large language models develop a modular cognitive architecture mirroring the human brain, with distinct neuron sets supporting language, formal reasoning, social reasoning, and physical reasoning. Across six models, within-domain neuron overlap exceeded cross-domain overlap by more than fourfold, and causal ablations showed domain-specific accuracy drops. The findings suggest modularity may be a fundamental property of intelligent systems.", "body_md": "Massachusetts Institute of Technology\n\n† Co-senior authors\n\nPreprint · 2026\n\nThe human brain exhibits a striking degree of functional specialization, with distinct\nnetworks supporting language, formal reasoning, reasoning about other minds, and reasoning\nabout the physical world. Is this modular organization a fundamental principle of how\nintelligent systems must be built, or an evolutionary accident specific to biological brains?\nHere, we test whether a similar organization emerges in Large Language Models, another class\nof intelligent systems created through a very different optimization process. Using circuit\nanalyses across *N* = 46 tasks spanning four cognitive domains (language, formal reasoning,\nsocial reasoning, physical reasoning), we find that LLMs develop a modular architecture that\nmirrors the human brain: tasks drawing on the same network in humans recruit overlapping\nneurons in LLMs, whereas tasks drawing on different networks recruit distinct neurons. The\nconvergent emergence of modularity in brains and neural networks suggests that it may be a\nfundamental property of intelligent systems.\n\nWe localize task-supporting units with **attribution patching**. For each of\n**46 tasks** across four cognitive domains we build minimal original/alternative input pairs\nwhose correct continuation flips. A unit's importance is its original-vs-alternative activation\ndifference times the gradient of the original−alternative logit difference, summed over examples.\nWe then quantify modular organization from the pairwise **overlap** of each task's top-0.1%\nunits, and validate it **causally** by ablating those units and measuring cross-task transfer.\nSix instruction-tuned LLMs (24B–123B, four families) are analyzed.\n\nEach task is defined by minimal **original** / **alternative** input pairs whose\ncorrect continuation flips. Click any task below to see one representative pair:\nthe original prompt and its correct continuation in green, and the alternative\nprompt with the flipped continuation in red.\n\nOne example per task; full datasets (the count shown per task) are in the\n[code repository](https://github.com/Pengrui-Han/LLM_Modularity_Final/tree/main/data).\n\nTasks supported by the same brain network in humans are solved by overlapping sets of\nneurons in the model, whereas tasks that draw on different networks recruit largely separate\nsets. Averaged across six models, this within-domain overlap exceeds cross-domain overlap by\nmore than fourfold (**12.9% vs 3.0%**, permutation test *p* < 0.0001). Unsupervised\nhierarchical clustering of the 46×46 task matrix recovers the four cognitive domains defined\nin neuroscience (**Adjusted Rand Index = 0.78**, *p* < 0.0001), and the structure\nis highly consistent across models (mean pairwise Kendall's τ = 0.70 ± 0.06). The same modular\norganization emerged in six different LLMs, from 24 to 123 billion parameters.\n\nTo test causal specificity, we ablate the top-0.1% units identified for a *source* task\nand evaluate the model on a different *target* task. Within-domain ablations cause a\n**25.9%** accuracy drop versus **2.5%** for cross-domain ablations\n(**ratio 10.3×**, *p* < 0.0001), consistent across models (Kendall's τ = 0.59 ± 0.05).\nThe asymmetry holds for every domain individually and in both directions of cross-domain\nablation.\n\n**A qualitative dissociation.** Inspecting the models' outputs\nafter targeted ablations reveals a separation between linguistic form and reasoning content.\nLesioning the neurons selectively required for the **language** tasks largely preserved the\nmodels' reasoning abilities but introduced syntactic and morphological errors. Conversely,\nlesioning **physical-reasoning** neurons led the models to incorrect reasoning and conclusions\nwhile preserving the linguistic well-formedness of the output.\n\nThe modular organization is not an artifact of the datasets, the contrastive design, or the attribution pipeline. Running the identical pipeline on GPT-2 (124M), which does not reach above-chance performance on the reasoning tasks, recovers only the broad division between Language and the rest of cognition, not the finer separation among the three reasoning domains. Modularity emerges only where the model can actually solve the tasks.\n\nA class of intelligent systems shaped by an entirely different process, gradient descent on next-token prediction, develops the same modular organization that characterizes the human brain: language, formal reasoning, physical reasoning, and social reasoning are each supported by largely distinct sets of neurons, while tasks within a domain share them. One influential account of cortical modularity appeals to metabolic cost, the idea that activating fewer neurons per task saves energy. That pressure does not exist in a transformer, whose forward pass carries no metabolic cost and whose loss never penalizes how many neurons are active. Modularity emerges anyway, which suggests this biological constraint is not necessary for functional specialization to arise.\n\nWhat might drive it instead? When several forms of reasoning must operate on the same input, the system faces pressure to keep those computations from interfering, both so that simultaneous representations stay separable and so that learning one domain does not overwrite another. Allocating distinct neurons to distinct computations protects against both. More broadly, the result shows the value of LLMs as a second kind of intelligent system against which to test claims about the structure of cognition: when a feature of the human mind reappears in a system built so differently, it is more likely to reflect a general principle of intelligence than an accident of biology.\n\n```\n@article{han2026modular,\n  title   = {Modular Cognitive Architecture Emerges in Large Language Models},\n  author  = {Han, Pengrui and Andreas, Jacob and Fedorenko, Evelina\n             and de Varda, Andrea Gregor},\n  journal = {Preprint},\n  year    = {2026},\n  note    = {Code and data: github.com/Pengrui-Han/LLM_Modularity_Final}\n}\n```\n\n", "url": "https://wpnews.pro/news/modular-cognitive-architecture-emerges-in-large-language-models", "canonical_source": "https://pengrui-han.github.io/LLM_Modularity_Page/", "published_at": "2026-07-01 04:17:40+00:00", "updated_at": "2026-07-01 04:50:21.724014+00:00", "lang": "en", "topics": ["large-language-models", "neural-networks", "ai-research", "ai-safety"], "entities": ["Massachusetts Institute of Technology", "GPT-2"], "alternates": {"html": "https://wpnews.pro/news/modular-cognitive-architecture-emerges-in-large-language-models", "markdown": "https://wpnews.pro/news/modular-cognitive-architecture-emerges-in-large-language-models.md", "text": "https://wpnews.pro/news/modular-cognitive-architecture-emerges-in-large-language-models.txt", "jsonld": "https://wpnews.pro/news/modular-cognitive-architecture-emerges-in-large-language-models.jsonld"}}