{"slug": "understanding-convolutions-on-graphs", "title": "Understanding Convolutions on Graphs", "summary": "This article from Distill explains graph neural networks (GNNs), a family of neural networks designed to operate on graph-structured data like social networks and molecules. It highlights the challenges of computing over graphs, such as their lack of consistent structure and node ordering, and describes how GNNs learn node representations through an iterative process to make predictions. The piece also notes that most naturally occurring graphs are sparse, which allows for efficient computation.", "body_md": "Understanding the building blocks and design choices of graph neural networks.\nThis article is one of two Distill publications about graph neural networks.\nTake a look at\nA Gentle Introduction to Graph Neural Networks\nMany systems and interactions - social networks, molecules, organizations, citations, physical models, transactions - can be represented quite naturally as graphs. How can we reason about and make predictions within these systems?\nOne idea is to look at tools that have worked well in other domains: neural networks have shown immense predictive power in a variety of learning tasks. However, neural networks have been traditionally used to operate on fixed-size and/or regular-structured inputs (such as sentences, images and video). This makes them unable to elegantly process graph-structured data.\nGraph neural networks (GNNs) are a family of neural networks that can operate naturally on graph-structured data. By extracting and utilizing features from the underlying graph, GNNs can make more informed predictions about entities in these interactions, as compared to models that consider individual entities in isolation.\nGNNs are not the only tools available to model graph-structured data:\ngraph kernels\nIn this article, we will illustrate the challenges of computing over graphs, describe the origin and design of graph neural networks, and explore the most popular GNN variants in recent times. Particularly, we will see that many of these variants are composed of similar building blocks.\nFirst, let’s discuss some of the complications that graphs come with.\nGraphs are extremely flexible mathematical models; but this means they lack consistent structure across instances.\nConsider the task of predicting whether a given chemical molecule is toxic\nLooking at a few examples, the following issues quickly become apparent:\nRepresenting graphs in a format that can be computed over is non-trivial, and the final representation chosen often depends significantly on the actual problem.\nExtending the point above: graphs often have no inherent ordering present amongst the nodes. Compare this to images, where every pixel is uniquely determined by its absolute position within the image!\nAs a result, we would like our algorithms to be node-order equivariant: they should not depend on the ordering of the nodes of the graph. If we permute the nodes in some way, the resulting representations of the nodes as computed by our algorithms should also be permuted in the same way.\nGraphs can be really large! Think about social networks like Facebook and Twitter, which have over a billion users. Operating on data this large is not easy.\nLuckily, most naturally occuring graphs are ‘sparse’: they tend to have their number of edges linear in their number of vertices. We will see that this allows the use of clever methods to efficiently compute representations of nodes within the graph. Further, the methods that we look at here will have significantly fewer parameters in comparison to the size of the graphs they operate on.\nThere are many useful problems that can be formulated over graphs:\nA common precursor in solving many of these problems is node representation learning: learning to map individual nodes to fixed-size real-valued vectors (called ‘representations’ or ‘embeddings’).\nIn Learning GNN Parameters, we will see how the learnt embeddings can be used for these tasks.\nDifferent GNN variants are distinguished by the way these representations are computed. Generally, however, GNNs compute node representations in an iterative process. We will use the notation to indicate the representation of node after the iteration. Each iteration can be thought of as the equivalent of a ‘layer’ in standard neural networks.\nWe will define a graph as a set of nodes, , with a set of edges connecting them. Nodes can have individual features as part of the input: we will denote by the individual feature for node . For example, the ‘node features’ for a pixel in a color image would be the red, green and blue channel (RGB) values at that pixel.\nFor ease of exposition, we will assume is undirected, and all nodes are of the same type.\nSometimes we will need to denote a graph property by a matrix , where each row represents a property corresponding to a particular vertex .\nConvolutional Neural Networks have been seen to be quite powerful in extracting features from images. However, images themselves can be seen as graphs with a very regular grid-like structure, where the individual pixels are nodes, and the RGB channel values at each pixel as the node features.\nA natural idea, then, is to consider generalizing convolutions to arbitrary graphs. Recall, however, the challenges\nlisted out in the previous section: in particular, ordinary convolutions are not node-order invariant, because\nthey depend on the absolute positions of pixels.\nIt is initially unclear as how to generalize convolutions over grids to convolutions over general graphs,\nwhere the neighbourhood structure differs from node to node.\nWe begin by introducing the idea of constructing polynomial filters over node neighbourhoods, much like how CNNs compute localized filters over neighbouring pixels. Then, we will see how more recent approaches extend on this idea with more powerful mechanisms. Finally, we will discuss alternative methods that can use ‘global’ graph-level information for computing node representations.\nGiven a graph , let us fix an arbitrary ordering of the nodes of . We denote the adjacency matrix of by , we can construct the diagonal degree matrix of as:\nwhere denotes the entry in the row corresponding to and the column corresponding to in the matrix . We will use this notation throughout this section.\nThen, the graph Laplacian is the square matrix defined as:\nThe graph Laplacian gets its name from being the discrete analog of the Laplacian operator from calculus.\nAlthough it encodes precisely the same information as the adjacency matrix\nNow that we have understood what the graph Laplacian is,\nwe can build polynomials\nThese polynomials can be thought of as the equivalent of ‘filters’ in CNNs, and the coefficients as the weights of the ‘filters’.\nFor ease of exposition, we will focus on the case where nodes have one-dimensional features: each of the for is just a real number. The same ideas hold when each of the are higher-dimensional vectors, as well.\nUsing the previously chosen ordering of the nodes, we can stack all of the node features to get a vector .\nOnce we have constructed the feature vector ,\nwe can define its convolution with a polynomial filter as:\nTo understand how the coefficients affect the convolution,\nlet us begin by considering the ‘simplest’ polynomial:\nwhen and all of the other coefficients are .\nIn this case, is just :\nNow, if we increase the degree, and consider the case where\ninstead and and all of the other coefficients are .\nThen, is just , and so:\nWe see that the features at each node are combined\nwith the features of its immediate neighbours .\nAt this point, a natural question to ask is:\nHow does the degree of the polynomial influence the behaviour of the convolution?\nIndeed, it is not too hard to show that:\nEffectively, the convolution at node occurs only with nodes which are not more than hops away. Thus, these polynomial filters are localized. The degree of the localization is governed completely by .\nTo help you understand these ‘polynomial-based’ convolutions better, we have created the visualization below. Vary the polynomial coefficients and the input grid to see how the result of the convolution changes. The grid under the arrow shows the equivalent convolutional kernel applied at the highlighted pixel in to get the resulting pixel in . The kernel corresponds to the row of for the highlighted pixel. Note that even after adjusting for position, this kernel is different for different pixels, depending on their position within the grid.\nHover over a pixel in the input grid (left, representing ) to highlight it and see the equivalent convolutional kernel for that pixel under the arrow. The result of the convolution is shown on the right: note that different convolutional kernels are applied at different pixels, depending on their location.\nClick on the input grid to toggle pixel values between (white) and (blue). To randomize the input grid, press ‘Randomize Grid’. To reset all pixels to , press ‘Reset Grid’. Use the sliders at the bottom to change the coefficients . To reset all coefficients to , press ‘Reset Coefficients.’\nWhat is the motivation behind these choices?\nThe polynomial filters we considered here are actually independent of the ordering of the nodes. This is particularly easy to see when the degree of the polynomial is : where each node’s feature is aggregated with the sum of its neighbour’s features. Clearly, this sum does not depend on the order of the neighbours. A similar proof follows for higher degree polynomials: the entries in the powers of are equivariant to the ordering of the nodes.\nAs above, let’s assume an arbitrary node-order over the nodes of our graph. Any other node-order can be thought of as a permutation of this original node-order. We can represent any permutation by a permutation matrix . will always be an orthogonal matrix: Then, we call a function node-order equivariant iff for all permutations : When switching to the new node-order using the permutation , the quantities below transform in the following way: and so, for the case of polynomial filters where , we can see that: as claimed.\nWe now describe how we can build a graph neural network by stacking ChebNet (or any polynomial filter) layers one after the other with non-linearities, much like a standard CNN. In particular, if we have different polynomial filter layers, the of which has its own learnable weights , we would perform the following computation:\nNote that these networks reuse the same filter weights across different nodes, exactly mimicking weight-sharing in Convolutional Neural Networks (CNNs) which reuse weights for convolutional filters across a grid.\nChebNet was a breakthrough in learning localized filters over graphs, and it motivated many to think of graph convolutions from a different perspective.\nWe return back to the result of convolving by the polynomial kernel , focussing on a particular vertex :As we noted before, this is a -hop localized convolution. But more importantly, we can think of this convolution as arising of two steps:\nKey Idea: What if we consider different kinds of ‘aggregation’ and ‘combination’ steps, beyond what are possible using polynomial filters?\nBy ensuring that the aggregation is node-order equivariant, the overall convolution becomes node-order equivariant.\nThese convolutions can be thought of as ‘message-passing’ between adjacent nodes: after each step, every node receives some ‘information’ from its neighbours.\nBy iteratively repeating the -hop localized convolutions times (i.e., repeatedly ‘passing messages’), the receptive field of the convolution effectively includes all nodes upto hops away.\nMessage-passing forms the backbone of many GNN architectures today. We describe the most popular ones in depth below:\nAn interesting point is to assess different aggregation functions: are some better and others worse?\nHere, we’ve talk about GNNs where the computation only occurs at the nodes.\nMore recent GNN models\nsuch as Message-Passing Neural Networks\nBelow is an interactive visualization of these GNN models on small graphs. For clarity, the node features are just real numbers here, shown inside the squares next to each node, but the same equations hold when the node features are vectors.\nIn practice, each iteration above is generally thought of as a single ‘neural network layer’.\nThis ideology is followed by many popular Graph Neural Network libraries,\nThe methods we’ve seen so far perform ‘local’ convolutions: every node’s feature is updated using a function of its local neighbo", "url": "https://wpnews.pro/news/understanding-convolutions-on-graphs", "canonical_source": "https://distill.pub/2021/understanding-gnns", "published_at": "2021-09-02 20:00:00+00:00", "updated_at": "2026-05-19 23:13:22.515727+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "research", "science"], "entities": ["Distill"], "alternates": {"html": "https://wpnews.pro/news/understanding-convolutions-on-graphs", "markdown": "https://wpnews.pro/news/understanding-convolutions-on-graphs.md", "text": "https://wpnews.pro/news/understanding-convolutions-on-graphs.txt", "jsonld": "https://wpnews.pro/news/understanding-convolutions-on-graphs.jsonld"}}