{"slug": "multimodal-neurons-in-artificial-neural-networks", "title": "Multimodal Neurons in Artificial Neural Networks", "summary": "Researchers discovered that artificial neural networks like CLIP contain \"multimodal neurons\" that respond to abstract concepts (such as \"Spider-Man\") across different input modalities like images and text. The team developed tools for scalable neuron investigation, including zero-shot neuron search and faceted feature visualization, and found that these neurons can exhibit interpretable behaviors such as responding to typographical attacks or breaking down complex emotions into simpler components.", "body_md": "### Acknowledgments\n\nWe are deeply grateful to Sandhini Agarwal, Daniela Amodei, Dario Amodei,\nTom Brown, Jeff Clune, Steve Dowling, Gretchen Krueger, Brice Menard,\nReiichiro Nakano, Aditya Ramesh, Pranav Shyam, Ilya Sutskever and Martin\nWattenberg.\n\n### Author Contributions\n\n*Gabriel Goh:* Research lead. Gabriel Goh first discovered multimodal\nneurons, sketched out the project direction and paper outline, and did\nmuch of the conceptual and engineering work that allowed the team to\ninvestigate the models in a scalable way. This included developing tools\nfor understanding how concepts were built up and decomposed (that were\napplied to emotion neurons), developing zero-shot neuron search (that\nallowed easy discoverability of neurons), and working with Michael Petrov\non porting CLIP to microscope. Subsequently developed faceted feature\nvisualization, and text feature visualization.\n\n*Chris Olah:* Worked with Gabe on the overall framing of the article,\nactively mentored each member of the team through their work providing\nboth high and low level contributions to their sections, and contributed\nto the text of much of the article, setting the stylistic tone. He worked\nwith Gabe on understanding the neuroscience literature and better\nunderstanding the relevant neuroscience literature. Additionally, he wrote\nthe sections on region neurons and developed diversity feature\nvisualization which Gabe used to create faceted feature visualization\n\n*Alec Radford:* Developed CLIP. First observed that CLIP was learning\nto read. Advised Gabriel Goh on project direction on a weekly basis. Upon\nthe discovery that CLIP was using text to classify images, proposed\ntypographical adversarial attacks as a promising research direction.\n\n*Shan Carter:* Worked on initial investigation of CLIP with Gabriel\nGoh. Did multimodal activation atlases to understand the space of\nmultimodal representations and geometry, and neuron atlases, which\npotentially helped the arrangement and display of neurons. Provided much\nuseful advice on the visual presentation of ideas, and helped with many\naspects of visual design.\n\n*Michael Petrov:* Worked on the initial investigation of multimodal\nneurons by implementing and scaling dataset examples. Discovered, with\nGabriel Goh, the original “Spider-Man” multimodal neuron in the dataset\nexamples, and many more multimodal neurons. Assisted a lot in the\nengineering of Microscope both early on, and at the end, including helping\nGabriel Goh with the difficult technical challenges of porting microscope\nto a different backend.\n\n*Chelsea Voss†:* Performed investigation of the typographical attacks\nphenomena, both via linear probes and zero-shot, confirming that the\nattacks were indeed real and state of the art. Proposed and successfully\nfound “in-the-wild” attacks in the zero-shot classifier. Subsequently\nwrote the section “typographical attacks”. Upon completion of this part of\nthe project, investigated responses of neurons to rendered text on\ndictionary words. Also assisted with the organization of neurons into\nneuron cards.\n\n*Nick Cammarata†:* Drew the connection between multimodal neurons in\nneural networks and multimodal neurons in the brain, which became the\noverall framing of the article. Created the conditional probability plots\n(regional, Trump, mental health), labeling more than 1500 images,\ndiscovered that negative pre-ReLU activations are often interpretable, and\ndiscovered that neurons sometimes contain a distinct regime change between\nmedium and strong activations. Wrote the identity section and the emotion\nsections, building off Gabriel’s discovery of emotion neurons and\ndiscovering that “complex” emotions can be broken down into simpler ones.\nEdited the overall text of the article and built infrastructure allowing\nthe team to collaborate in Markdown with embeddable components.\n\n*Ludwig Schubert:* Helped with general infrastructure.\n\n† equal contributors\n\n### Discussion and Review\n\n[Review 1 - Anonymous](https://github.com/distillpub/post--multimodal/issues/1)\n\n[Review 2 - Anonymous](https://github.com/distillpub/post--multimodal/issues/2)\n\n[Review 3 - Anonymous](https://github.com/distillpub/post--multimodal/issues/3)\n\n### References\n\n- Invariant visual representation by single neurons in the human brain\n[[PDF]](http://amygdala.psychdept.arizona.edu/IntroData/Readings/week5/Quiroga-reddy-kreiman-koch-Fried+invariant-visual-single-neurons-human+Nature+2005.pdf)\n\nQuiroga, R.Q., Reddy, L., Kreiman, G., Koch, C. and Fried, I., 2005. Nature, Vol 435(7045), pp. 1102--1107. Nature Publishing Group. - Explicit encoding of multimodal percepts by single neurons in the human brain\n\nQuiroga, R.Q., Kraskov, A., Koch, C. and Fried, I., 2009. Current Biology, Vol 19(15), pp. 1308--1313. Elsevier. - Learning Transferable Visual Models From Natural Language Supervision\n[[link]](https://blog.openai.com/clip)\n\nRadford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G. and Sutskever, I., 2021. - Deep Residual Learning for Image Recognition\n[[PDF]](http://arxiv.org/pdf/1512.03385.pdf)\n\nHe, K., Zhang, X., Ren, S. and Sun, J., 2015. CoRR, Vol abs/1512.03385. - Attention is all you need\n\nVaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L. and Polosukhin, I., 2017. Advances in neural information processing systems, pp. 5998--6008. - Improved deep metric learning with multi-class n-pair loss objective\n\nSohn, K., 2016. Advances in neural information processing systems, pp. 1857--1865. - Contrastive multiview coding\n\nTian, Y., Krishnan, D. and Isola, P., 2019. arXiv preprint arXiv:1906.05849. - Linear algebraic structure of word senses, with applications to polysemy\n\nArora, S., Li, Y., Liang, Y., Ma, T. and Risteski, A., 2018. Transactions of the Association for Computational Linguistics, Vol 6, pp. 483--495. MIT Press. - Visualizing and understanding recurrent networks\n[[PDF]](https://arxiv.org/pdf/1506.02078.pdf)\n\nKarpathy, A., Johnson, J. and Fei-Fei, L., 2015. arXiv preprint arXiv:1506.02078. - Object detectors emerge in deep scene cnns\n[[PDF]](https://arxiv.org/pdf/1412.6856.pdf)\n\nZhou, B., Khosla, A., Lapedriza, A., Oliva, A. and Torralba, A., 2014. arXiv preprint arXiv:1412.6856. - Network Dissection: Quantifying Interpretability of Deep Visual Representations\n[[PDF]](https://arxiv.org/pdf/1704.05796.pdf)\n\nBau, D., Zhou, B., Khosla, A., Oliva, A. and Torralba, A., 2017. Computer Vision and Pattern Recognition. - Zoom In: An Introduction to Circuits\n\nOlah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M. and Carter, S., 2020. Distill, Vol 5(3), pp. e00024--001. - Multifaceted feature visualization: Uncovering the different types of features learned by each neuron in deep neural networks\n[[PDF]](https://arxiv.org/pdf/1602.03616.pdf)\n\nNguyen, A., Yosinski, J. and Clune, J., 2016. arXiv preprint arXiv:1602.03616. - Sparse but not ‘grandmother-cell’ coding in the medial temporal lobe\n\nQuiroga, R.Q., Kreiman, G., Koch, C. and Fried, I., 2008. Trends in cognitive sciences, Vol 12(3), pp. 87--91. Elsevier. - Concept cells: the building blocks of declarative memory functions\n\nQuiroga, R.Q., 2012. Nature Reviews Neuroscience, Vol 13(8), pp. 587--597. Nature Publishing Group. - Emotional expressions reconsidered: Challenges to inferring emotion from human facial movements\n\nBarrett, L.F., Adolphs, R., Marsella, S., Martinez, A.M. and Pollak, S.D., 2019. Psychological science in the public interest, Vol 20(1), pp. 1--68. Sage Publications Sage CA: Los Angeles, CA. - Geographical evaluation of word embeddings\n[[PDF]](https://www.aclweb.org/anthology/I17-1023.pdf)\n\nKonkol, M., Brychc{\\'\\i}n, T., Nykl, M. and Hercig, T., 2017. Proceedings of the Eighth International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pp. 224--232. - Using Artificial Intelligence to Augment Human Intelligence\n[[link]](https://distill.pub/2017/aia/)\n\nCarter, S. and Nielsen, M., 2017. Distill. [DOI: 10.23915/distill.00009](https://doi.org/10.23915/distill.00009) - Visualizing Representations: Deep Learning and Human Beings\n[[link]](http://colah.github.io/posts/2015-01-Visualizing-Representations/)\n\nOlah, C., 2015. - Natural language processing (almost) from scratch\n\nCollobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K. and Kuksa, P., 2011. Journal of machine learning research, Vol 12(ARTICLE), pp. 2493--2537. - Linguistic regularities in continuous space word representations\n\nMikolov, T., Yih, W. and Zweig, G., 2013. Proceedings of the 2013 conference of the north american chapter of the association for computational linguistics: Human language technologies, pp. 746--751. - Man is to computer programmer as woman is to homemaker? debiasing word embeddings\n\nBolukbasi, T., Chang, K., Zou, J.Y., Saligrama, V. and Kalai, A.T., 2016. Advances in neural information processing systems, pp. 4349--4357. - Intriguing properties of neural networks\n[[PDF]](https://arxiv.org/pdf/1312.6199.pdf)\n\nSzegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I. and Fergus, R., 2013. arXiv preprint arXiv:1312.6199. - Visualizing higher-layer features of a deep network\n[[PDF]](https://www.researchgate.net/profile/Aaron_Courville/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network/links/53ff82b00cf24c81027da530.pdf)\n\nErhan, D., Bengio, Y., Courville, A. and Vincent, P., 2009. University of Montreal, Vol 1341, pp. 3. - Feature Visualization\n[[link]](https://distill.pub/2017/feature-visualization)\n\nOlah, C., Mordvintsev, A. and Schubert, L., 2017. Distill. [DOI: 10.23915/distill.00007](https://doi.org/10.23915/distill.00007) - How does the brain solve visual object recognition?\n\nDiCarlo, J.J., Zoccolan, D. and Rust, N.C., 2012. Neuron, Vol 73(3), pp. 415--434. Elsevier. - Imagenet: A large-scale hierarchical image database\n\nDeng, J., Dong, W., Socher, R., Li, L., Li, K. and Fei-Fei, L., 2009. 2009 IEEE conference on computer vision and pattern recognition, pp. 248--255. - BREEDS: Benchmarks for Subpopulation Shift\n\nSanturkar, S., Tsipras, D. and Madry, A., 2020. arXiv preprint arXiv:2008.04859. - Global Weighted Average Pooling Bridges Pixel-level Localization and Image-level Classification\n[[PDF]](http://arxiv.org/pdf/1809.08264.pdf)\n\nQiu, S., 2018. CoRR, Vol abs/1809.08264. - Separating style and content with bilinear models\n\nTenenbaum, J.B. and Freeman, W.T., 2000. Neural computation, Vol 12(6), pp. 1247--1283. MIT Press. - The feeling wheel: A tool for expanding awareness of emotions and increasing spontaneity and intimacy\n\nWillcox, G., 1982. Transactional Analysis Journal, Vol 12(4), pp. 274--276. SAGE Publications Sage CA: Los Angeles, CA. - Activation atlas\n\nCarter, S., Armstrong, Z., Schubert, L., Johnson, I. and Olah, C., 2019. Distill, Vol 4(3), pp. e15. - Adversarial Patch\n[[PDF]](https://arxiv.org/pdf/1712.09665.pdf)\n\nBrown, T., Mané, D., Roy, A., Abadi, M. and Gilmer, J., 2017. arXiv preprint arXiv:1712.09665. - Synthesizing Robust Adversarial Examples\n[[PDF]](https://arxiv.org/pdf/1707.07397.pdf)\n\nAthalye, A., Engstrom, L., Ilyas, A. and Kwok, K., 2017. arXiv preprint arXiv:1707.07397. - Studies of interference in serial verbal reactions.\n\nStroop, J.R., 1935. Journal of experimental psychology, Vol 18(6), pp. 643. Psychological Review Company. - Curve Detectors\n\nCammarata, N., Goh, G., Carter, S., Schubert, L., Petrov, M. and Olah, C., 2020. Distill, Vol 5(6), pp. e00024--003. - An overview of early vision in inceptionv1\n\nOlah, C., Cammarata, N., Schubert, L., Goh, G., Petrov, M. and Carter, S., 2020. Distill, Vol 5(4), pp. e00024--002. - Deep inside convolutional networks: Visualising image classification models and saliency maps\n[[PDF]](https://arxiv.org/pdf/1312.6034.pdf)\n\nSimonyan, K., Vedaldi, A. and Zisserman, A., 2013. arXiv preprint arXiv:1312.6034. - Deep neural networks are easily fooled: High confidence predictions for unrecognizable images\n[[PDF]](https://arxiv.org/pdf/1412.1897.pdf)\n\nNguyen, A., Yosinski, J. and Clune, J., 2015. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 427--436. [DOI: 10.1109/cvpr.2015.7298640](https://doi.org/10.1109/cvpr.2015.7298640) - Inceptionism: Going deeper into neural networks\n[[HTML]](https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html)\n\nMordvintsev, A., Olah, C. and Tyka, M., 2015. Google Research Blog. - Plug & play generative networks: Conditional iterative generation of images in latent space\n[[PDF]](https://arxiv.org/pdf/1612.00005.pdf)\n\nNguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A. and Yosinski, J., 2016. arXiv preprint arXiv:1612.00005. - Sun database: Large-scale scene recognition from abbey to zoo\n\nXiao, J., Hays, J., Ehinger, K.A., Oliva, A. and Torralba, A., 2010. 2010 IEEE computer society conference on computer vision and pattern recognition, pp. 3485--3492. - The pascal visual object classes (voc) challenge\n\nEveringham, M., Van Gool, L., Williams, C.K., Winn, J. and Zisserman, A., 2010. International journal of computer vision, Vol 88(2), pp. 303--338. Springer. - Fairface: Face attribute dataset for balanced race, gender, and age\n\nKärkkäinen, K. and Joo, J., 2019. arXiv preprint arXiv:1908.04913. - A style-based generator architecture for generative adversarial networks\n\nKarras, T., Laine, S. and Aila, T., 2019. Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 4401--4410.\n\n### Updates and Corrections\n\nIf you see mistakes or want to suggest changes, please [create an issue on GitHub](https://github.com/distillpub/post--multimodal/issues/new).\n\n### Reuse\n\nDiagrams and text are licensed under Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) with the [source available on GitHub](https://github.com/distillpub/post--multimodal), unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.\n\n### Citation\n\nFor attribution in academic contexts, please cite this work as\n\n```\nGoh, et al., \"Multimodal Neurons in Artificial Neural Networks\", Distill, 2021.\n```\n\nBibTeX citation\n\n```\n@article{goh2021multimodal,\n  author = {Goh, Gabriel and †, Nick Cammarata and †, Chelsea Voss and Carter, Shan and Petrov, Michael and Schubert, Ludwig and Radford, Alec and Olah, Chris},\n  title = {Multimodal Neurons in Artificial Neural Networks},\n  journal = {Distill},\n  year = {2021},\n  note = {https://distill.pub/2021/multimodal-neurons},\n  doi = {10.23915/distill.00030}\n}\n```\n\n", "url": "https://wpnews.pro/news/multimodal-neurons-in-artificial-neural-networks", "canonical_source": "https://distill.pub/2021/multimodal-neurons", "published_at": "2021-03-04 20:00:00+00:00", "updated_at": "2026-05-19 23:15:24.858044+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "research"], "entities": ["Gabriel Goh", "Chris Olah", "Alec Radford", "CLIP", "Sandhini Agarwal", "Daniela Amodei", "Dario Amodei", "Ilya Sutskever"], "alternates": {"html": "https://wpnews.pro/news/multimodal-neurons-in-artificial-neural-networks", "markdown": "https://wpnews.pro/news/multimodal-neurons-in-artificial-neural-networks.md", "text": "https://wpnews.pro/news/multimodal-neurons-in-artificial-neural-networks.txt", "jsonld": "https://wpnews.pro/news/multimodal-neurons-in-artificial-neural-networks.jsonld"}}