{"slug": "visualizing-weights", "title": "Visualizing Weights", "summary": "Challenge of understanding neural networks by visualizing their weights, comparing it to reverse engineering compiled code or studying biological neural networks. It notes that despite the importance of weights, researchers often overlook direct inspection due to barriers like high dimensionality, and introduces one-sided Non-negative Matrix Factorization (NMF) as a simple method to collapse input channels for easier visualization. The technique helps reveal spatial structures and phenomena like \"weight banding,\" but the article emphasizes that weights must be contextualized within the broader network to gain meaningful insights.", "body_md": "[Distill](/)\n\nThe problem of understanding a neural network is a little bit like reverse engineering a large compiled binary of a computer program. In this analogy, the weights of the neural network are the compiled assembly instructions. At the end of the day, the weights are the fundamental thing you want to understand: how does this sequence of convolutions and matrix multiplications give rise to model behavior?\n\nTrying to understand artificial neural networks also has a lot in common with neuroscience, which tries to understand biological neural networks. As you may know, one major endeavor in modern neuroscience is mapping the [connectomes](https://en.wikipedia.org/wiki/Connectome) of biological neural networks: which neurons connect to which. These connections, however, will only tell neuroscientists which weights are non-zero. Getting the weights – knowing whether a connection excites or inhibits, and by how much – would be a significant further step. One imagines neuroscientists might give a great deal to have the access to weights that those of us studying artificial neural networks get for free.\n\nAnd so, it’s rather surprising how little attention we actually give to looking at the weights of neural networks. There are a few exceptions to this, of course. It’s quite common for researchers to show pictures of the first layer weights in vision models\n\nIn this article, we’re focusing on visualizing weights. But people often visualize activations, attributions, gradients, and much more. How should we think about the meaning of visualizing these different objects?\n\nIt seems to us that there are three main barriers to making sense of the weights in neural networks, which may have contributed to researchers tending to not directly inspect them:\n\nMany of the methods we’ll use to address these problems were previously explored in [Building Blocks](https://distill.pub/2018/building-blocks/) in the context of understanding activation vectors\n\nInterpretability methods often fail to take off because they’re hard to use. So before diving into sophisticated approaches, we wanted to offer a simple, easy to apply method.\n\nIn a convolutional network, the input weights for a given neuron have shape `[width, height, input_channels]`\n\n. Unless this is the first convolutional layer, this probably can’t be easily visualized because `input_channels`\n\nis large. (If this is the first convolutional layer, visualize it as is!) However, one can use dimensionality reduction to collapse `input_channels`\n\ndown to 3 dimensions. We find one-sided NMF especially effective for this.\n\nThis visualization doesn’t tell you very much about what your weights are doing in the context of the larger model, but it does show you that they are learning nice spatial structures. This can be an easy sanity check that your neurons are learning, and a first step towards understanding your neuron’s behavior. We’ll also see later that this general approach of factoring weights can be extended into a powerful tool for studying neurons.\n\nDespite this lack of contextualization, one-sided NMF can be a great technique for investigating multiple channels at a glance. One thing you may quickly discover using this method is that, in models with global average pooling at the end of their convolutional layers, the last few layers will have all their weights be horizontal bands.\n\n[2](#figure-2):\nHorizontally-banded weights in InceptionV1 `mixed5b_5x5`\n\n, for a selection of eight neurons. As in Figure 1, the red, green, and blue channels on each grid indicate the weights for each of the 3 NMF factors.\n\nWe call this phenomenon [ weight banding](/2020/circuits/weight-banding/). One-sided NMF allows for quickly testing and validating hypotheses about phenomena such as weight banding.\n\nOf course, looking at weights in a vacuum isn’t very interesting. In order to really understand what’s going on, we need to *contextualize* weights in the broader context of the network\n\nRecall that the weights between two convolutional layers are a four dimensional array of the shape:\n\n```\n[relative x position, relative y position,\ninput channels, output channels]\n```\n\nIf we fix the input channel and the output channel, we get a 2D array we can present with traditional data visualization. Let’s assume we know which neuron we’re interested in understanding, so we have the output channel. We can pick the input channels with high magnitude weights to our output channel.\n\nBut what does the input represent? What about the output?\n\nThe key trick is that techniques like feature visualization\n\nThis approach is the weight analogue of using feature visualizations to contextualize activation vectors in [Building Blocks](https://distill.pub/2018/building-blocks/) (see the section titled “Making Sense of Hidden Layers”).\n\nWe can liken this to how, when reverse-engineering a normal compiled computer program, one would need to start assigning variable names to the values stored in registers to keep track of them. Feature visualizations are essentially automatic variable names for neurons, which are roughly analogous to those registers or variables.\n\nOf course, neurons have multiple inputs, and it can be helpful to show the weights to several inputs at a time as a [small multiple](https://en.wikipedia.org/wiki/Small_multiple):\n\n`mixed3b`\n\n342And if we have two families of related neurons interacting, it can sometimes even be helpful to show the weights between all of them as a grid of small multiples:\n\n[5](#figure-5): Small multiple weights for a variety of [curve detectors](https://distill.pub/2020/circuits/curve-detectors/).\n\nAlthough we most often use feature visualization to visualize neurons, we can visualize any direction (linear combination of neurons). This opens up a very wide space of possibilities for visualizing weights, of which we’ll explore a couple particularly useful ones.\n\nRecall that the weights for a single neuron have shape `[width, height, input_channels]`\n\n. In the previous section we split up `input_channels`\n\nand visualized each `[width, height]`\n\nmatrix. But an alternative approach is to think of there as being a vector over input neurons at each spatial position, and to apply feature visualization to each of those vectors. You can think of this as telling us what the weights in that position are collectively looking for.\n\nThis visualization is the weight analogue of the [“Activation Grid” visualization](https://distill.pub/2018/building-blocks/#ActivationVecVis) from Building Blocks. It can be a nice, high density way to get an overview of what the weights for one neuron are doing. However, it will be unable to capture cases where one position responds to multiple very different things, as in a multi-faceted or polysemantic neuron.\n\nFeature visualization can also be applied to factorizations of the weights, which we briefly discussed earlier. This is the weight analogue to the “Neuron Groups” visualization from Building Blocks.\n\nThis can be especially helpful when you have a group of neurons like [high-low frequency detectors](/2020/circuits/frequency-edges/) or black and white vs color detectors that look are all mostly looking for a small number of factors. For example, a large number of high-low frequency detectors can be significantly understood as combining just two factors – a high frequency factor and a low-frequency factor – in different patterns.\n\n`conv2d2`\n\n.\nThese factors can then be decomposed into individual neurons for more detailed understanding.\n\nAs we mentioned earlier, sometimes the meaningful weight interactions are between neurons which aren’t literally adjacent in a neural network, or where the weights aren’t directly represented in a single weight tensor. A few examples:\n\nAs a result, we often work with “expanded weights” – that is, the result of multiplying adjacent weight matrices, potentially ignoring non-linearities. We generally implement expanded weights by taking gradients through our model, ignoring or replacing all non-linear operations with the closest linear one.\n\nThese expanded weights have the following properties:\n\nThey also have one additional benefit, which is more of an implementation detail: because they’re implemented in terms of gradients, you don’t need to know how the weights are represented. For example, in TensorFlow, you don’t need to know which variable object represents the weights. This can be a significant convenience when you’re working with unfamiliar models!\n\nMultiplying out the weights like this can sometimes help us see a simpler underlying structure. For example, [ mixed3b 208](https://microscope.openai.com/models/inceptionv1/mixed3b_0/208) is a black and white center detector. It’s built by combining a bunch of black and white vs color detectors together.\n\n[9](#figure-9). `mixed3b`\n\n208 along with five neurons from `mixed3a`\n\nthat contribute the [strongest weights](https://storage.googleapis.com/distill-circuits/inceptionv1-weight-explorer/mixed3b_208.html) to it.\n\nExpanding out the weights allows us to see an important aggregate effect of these connections: together, they look for the absence of color in the center one layer further back.\n\n[10](#figure-10). Top eighteen [expanded weights](https://storage.googleapis.com/distill-circuits/inceptionv1-weight-explorer/mixed3b_208.html) from `conv2d2`\n\nto [ mixed3b 208](https://storage.googleapis.com/distill-circuits/inceptionv1-weight-explorer/mixed3b_208.html), organized in two rows according to weight factorization.\n\nA particularly important use of this method – which we’ve been implicitly using in earlier examples – is to jump over “bottleneck layers.” Bottleneck layers are layers of the network which squeeze the number of channels down to a much smaller number, typically in a branch, making large spatial convolutions cheaper. The [bottleneck layers](https://microscope.openai.com/models/inceptionv1/mixed3a_5x5_bottleneck_0?models.op.feature_vis.type=channel&models.op.technique=feature_vis) of InceptionV1 are one example. Since so much information is compressed, these layers are often polysemantic, and it can often be more helpful to jump over them and understand the connection to the wider layer before them.\n\nExpanded weights can, of course, be misleading when non-linear structure is important. A good example of this is [boundary detectors](https://distill.pub/2020/circuits/early-vision/#group_mixed3b_boundary). Recall that boundary detectors usually detect both low-to-high and high-to-low frequency transitions:\n\n[11](#figure-11). Boundary detectors such as [ mixed3b 345](https://storage.googleapis.com/distill-circuits/inceptionv1-weight-explorer/mixed3b_345.html) detect both low-to-high and high-to-low frequency transitions.\n\nSince high-low frequency detectors are [usually](/2020/circuits/frequency-edges/) excited by high-frequency patterns on one side and inhibited on the other (and vice versa for low frequency), detecting both directions means that the expanded weights cancel out! As a result, expanded weights appear to show that boundary detectors are neither excited or inhibited by high frequency detectors two layers back, when in fact they are *both* excited and also inhibited by high frequency, depending on the context, and it’s just that those two different cases cancel out.\n\n[12](#figure-12).\nNeurons two layers back (such as [ conv2d2 89](https://storage.googleapis.com/distill-circuits/inceptionv1-weight-explorer/conv2d2_89.html)) may have a strong influence on the high-low frequency detectors that contribute to\n\n`mixed3b`\n\n345 (top), but that influence washes out when we look at the expanded weights (bottom) directly between `conv2d2`\n\n89 and `mixed3b`\n\n345.\nMore sophisticated techniques for describing multi-layer interactions can help us understand cases like this. For example, one can determine what the “best case” excitation interaction between two neurons is (that is, the maximum achievable gradient between them). Or you can look at the gradient for a particular example. Or you can factor the gradient over many examples to determine major possible cases. These are all useful techniques, but we’ll leave them for a future article to discuss.\n\nOne qualitative property of expanding weights across many layers deserves mention before we end our discussion of them. Expanded weights often get this kind of “electron orbital”-like smooth spatial structures:\n\n[13](#figure-13). Smooth spatial structure of some expanded weights from [ mixed3b 268](https://storage.googleapis.com/distill-circuits/inceptionv1-weight-explorer/mixed3b_268.html) to\n\n`conv2d1`\n\n.\nAlthough the exact structures present may vary from neuron to neuron, this example is not cherry-picked: this smoothness is typical of most multiple-layer expanded weights.\nIt’s not clear how to interpret this, but it’s suggestive of rich spatial structure on the scale of multiple layers.\n\nSo far, we’ve addressed the challenges of contextualization and indirection interactions. But we’ve only given a bit of attention to our third challenge of dimensionality and scale. Neural networks contain many neurons and each one connects to many others, creating a huge amount of weights. How do we pick which connections between neurons to look at?\n\nFor the purposes of this article, we’ll put the question of which neurons we want to study outside of our scope, and only discuss the problem of picking which connections to study. (We may be trying to comprehensively study a model, in which case we want to study all neurons. But we might also, for example, be trying to study neurons we’ve determined related to some narrower aspect of model behavior.)\n\nGenerally, we chose to look at the largest weights, as we did at the beginning of the section on contextualization. Unfortunately, there tends to be a long tail of small weights, and at some point it generally gets impractical to look at these. How much of the story is really hiding in these small weights? We don’t know, but polysemantic neurons suggest there could be a very important and subtle story hiding here! There’s some hope that sparse neural networks might make this much better, by getting rid of small weights, but whether such conclusions can be drawn about non-sparse networks is presently speculative.\n\nAn alternative strategy that we’ve brushed on a few times is to reduce your weights into a few components and then study those factors (for example, with NMF). Often, a very small number of components can explain much of the variance. In fact, sometimes a small number of factors can explain the weights of an entire set of neurons! Prominent examples of this are high-low frequency detectors (as we saw earlier) and black and white vs color detectors.\n\nHowever, this approach also has downsides. Firstly, these components can be harder to understand and even polysemantic. For example, if you apply the basic version of this method to a boundary detector, one component will contain both high-to-low and low-to-high frequency detectors which will make it hard to analyze. Secondly, your factors no longer align with activation functions, which makes analysis much messier. Finally, because you will be reasoning about every neuron in a different basis, it is difficult to build a bigger picture view of the model unless you convert your components back to neurons.\n\nThis article grew out of a document that Chris Olah wrote in order to act as an explanation for our techniques for visualizing weights.\n\n**Research.** The necessity of visualizing weights is a problem we encounter frequently, and our techniques have been refined across many investigations of features and circuits, so it is difficult to fully separate out all contributions towards improving those techniques.\n\nMany people “test drove” these visualization methods, and a lot of our practical knowledge of using them to study circuits came from that.\nFor example, the curve detector examples used in *Small Multiples* are due to Nick Cammarata’s work investigating curve detectors. Gabe Goh performed experiments that moved *Visualizing Spatial Position Weights* forward. The high-low frequency detector example and NMF factors used in *Visualizing Weight Factors* are due to experiments performed by Ludwig Schubert, and the weight banding examples in *Aside: One Simple Trick* are due to experiments run by Michael Petrov.\n\n**Writing and Diagrams.** Chris wrote the article and developed the designs for its original figures. Chelsea Voss ported the article to Distill, upgraded the diagrams for the new format, edited some text, and developed Figure 12, and Chris provided feedback and guidance throughout.\n\n**Code.** Chris authored the Tensorflow (Lucid) notebook, and Ben Egan and Swee Kiat Lim authored the PyTorch (Captum) notebook.\n\nWe are grateful to participants of #circuits in the Distill slack for their early comments and engagement with these concepts, including Kenneth Co, Humza Iqbal, and Vincent Tjeng. We are also grateful to Daniel Filan, Humza Iqbal, Stefan Sietzen, and Vincent Tjeng for remarks on a draft.\n\nIf you see mistakes or want to suggest changes, please [create an issue on GitHub](https://github.com/distillpub/post--circuits-visualizing-weights/issues/new).\n\nDiagrams and text are licensed under Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) with the [source available on GitHub](https://github.com/distillpub/post--circuits-visualizing-weights), unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.\n\nFor attribution in academic contexts, please cite this work as\n\n```\nVoss, et al., \"Visualizing Weights\", Distill, 2021.\n```\n\nBibTeX citation\n\n```\n@article{voss2021visualizing,\n  author = {Voss, Chelsea and Cammarata, Nick and Goh, Gabriel and Petrov, Michael and Schubert, Ludwig and Egan, Ben and Lim, Swee Kiat and Olah, Chris},\n  title = {Visualizing Weights},\n  journal = {Distill},\n  year = {2021},\n  note = {https://distill.pub/2020/circuits/visualizing-weights},\n  doi = {10.23915/distill.00024.007}\n}\n```\n\n", "url": "https://wpnews.pro/news/visualizing-weights", "canonical_source": "https://distill.pub/2020/circuits/visualizing-weights", "published_at": "2021-02-04 20:00:00+00:00", "updated_at": "2026-05-19 23:16:06.071618+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "research"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/visualizing-weights", "markdown": "https://wpnews.pro/news/visualizing-weights.md", "text": "https://wpnews.pro/news/visualizing-weights.txt", "jsonld": "https://wpnews.pro/news/visualizing-weights.jsonld"}}