{"slug": "self-classifying-mnist-digits", "title": "Self-classifying MNIST Digits", "summary": "\"self-classifying MNIST task,\" where a grid of locally-communicating agents, operating under identical rules, must determine which digit their collective shape forms without any agent knowing its global position. The researchers use a neural cellular automata model, adapted from prior work on growing patterns, to enable these agents to output a digit label based solely on their local interactions and the overall morphology of the living cells. The work draws a direct biological analogy to how groups of cells in organisms, such as a salamander's transplanted tail remodeling into a limb, can collectively sense and repair toward a specific target morphology.", "body_md": "Achieving Distributed Coordination with Neural Cellular Automata\nGrowing Neural Cellular Automata\nOur question is closely related to another unsolved problem in developmental and regenerative biology: how cell groups decide whether an organ or tissue pattern is correct, or whether current anatomy needs to be remodeled (anatomical surveillance and repair toward a specific target morphology). For example, when scientists surgically transplanted a salamander tail to its flank, it slowly remodeled into a limb - the organ that belongs at this location\nSuppose a population of agents is arranged on a grid. They do not know where they are in the grid and they can only communicate with their immediate neighbors. They can also observe whether a neighbor is missing. Now suppose these agents are arranged to form the shape of a digit. Given that all the agents operate under the same rules, can they form a communication protocol such that, after a number of iterations of communication, all of the agents know which digit they are forming? Furthermore, if some agents were to be removed and added to form a new digit from a preexisting one, would they be able to know which the new digit is?\nBecause digits are not rotationally invariant (i.e. 6 is a rotation of 9), we presume the agents must be made aware of their orientation with respect to the grid. Therefore, while they do not know where they are, they do know where up, down, left and right are. The biological analogy here is a situation where the remodeling structures exist in the context of a larger body and a set of morphogen gradients or tissue polarity that indicate directional information with respect to the three major body axes. Given these preliminaries, we introduce the self-classifying MNIST task.\nEach sample of the MNIST dataset\nOur goal is for all cells that make up the digit to correctly output the label of the digit. To convey this structural information to the cells, we make a distinction between alive and dead cells by rescaling the values of the image to [0, 1]. Then we treat a cell as alive if its value in the MNIST sample is larger than 0.1. The intuition here is that we are placing living cells in a cookie cutter and asking them to identify the global shape of the cookie cutter. We visualize the label output by assigning a color to each cell, as you can see above. We use the same mapping between colors and labels throughout the article. Please note that there is a slider in the interactive demo controls which you can use to adjust the color palette if you have trouble differentiating between the default colors.\nIn this article, we use a variant of the neural cellular automata model described in Growing Cellular Automata\nThe work in Growing CA used RGB images as targets, and optimized the first three state channels to approximate those images. For our experiments, we treat the last ten channels of our cells as a pseudo-distribution over each possible label (digit). During inference, we simply pick the label corresponding to the channel with the highest output value.\nIn Growing CA we assigned a cell’s state to be “dead” or “alive” based on the strength of its alpha channel and the activity of its neighbors. This is similar to the rules of Conway’s Game of Life\nA note on digit topology. Keen readers may notice that our model requires each digit to be a single connected component in order for classification to be possible, since any disconnected components will be unable to propagate information between themselves. We made this design decision in order to stay true to our core biological analogy, which involves a group of cells that is trying to identify its global shape. Even though the vast majority of samples from MNIST are fully connected, some aren’t. We do not expect our models to classify non-connected minor components correctly, but we do not remove them\nThe Growing CA article made use of fixed 3x3 convolutions with Sobel filters to estimate the state gradients in and . We found that fully trainable 3x3 kernels outperformed their fixed counterparts and so used them in this work.\nA note on model size. Like the Growing CA model, our MNIST CA is small by the standards of deep learning - it has less than 25k parameters. Since this work aims to demonstrate a novel approach to classification, we do not attempt to maximise the validation accuracy of the model by increasing the number of parameters or any other tuning. We suspect that, as with other deep neural network models, one would observe a positive correlation between accuracy and model size.\nIn our first experiment, we use the same training paradigm as was discussed in Growing CA. We train with a pool of initial samples to allow the model to learn to persist and then perturb the converged states. However, our perturbation is different. Previously, we destroyed the states of cells at random in order to make the CAs resistant to destructive perturbations (analogous to traumatic tissue loss). In this context, perturbation has a slightly different role to play. Here we aim to build a CA model that not only has regenerative properties, but also has the ability to correct itself when the shape of the overall digit changes.\nBiologically, this corresponds to a teratogenic influence during development, or alternatively, a case of an incorrect or incomplete remodeling event such as metamorphosis or rescaling. The distinction between training our model from scratch and training it to accommodate perturbations is subtle but important. An important feature of life is the ability to react adaptively to external perturbations that are not accounted for in the normal developmental sequence of events. If our virtual cells simply learned to recognize a digit and then entered some dormant state and did not react to any further changes, we would be missing this key property of living organisms. One could imagine a trivial solution in the absence of perturbations, where a single wave of information is passed from the boundaries of the digit inwards and then back out, in such a way that all cells could agree on a correct classification. By introducing perturbations to new digits, the cells have to be in constant communication and achieve a “dynamic homeostasis” - continually “kept on their toes” in anticipation of new or further communication from their neighbours.\nIn our model, we achieve this dynamic homeostasis by randomly mutating the underlying digit at training time. Starting from a certain digit and after some time evolution, we sample a new digit, erase all cell states that are not present in both digits and bring alive the cells that were not present in the original digit but are present in the new digit. This kind of mutation teaches CAs to learn to process new information and adapt to changing conditions. It also exposes the cells to training states where all of the cells that remain after a perturbation are misclassifying the new digit and must recover from this catastrophic mutation. This in turn forces our CAs to learn to change their own classifications to adapt to changing global structure.\nWe use a pixel-wise (cell-wise) cross entropy loss on the last ten channels of each pixel, applying it after letting the CA evolve for 20 steps.\nThe video above shows the CA classifying a batch of digits for 200 steps. We then mutate the digits and let the system evolve and classify for a further 200 steps.\nThe results look promising overall and we can see how our CAs are able to recover from mutations. However, astute observers may notice that often not all cells agree with each other. Often, the majority of the digit is classified correctly, but some outlier cells are still convinced they are part of a different digit, often switching back and forth in an oscillating pattern, causing a flickering effect in the visualization. This is not ideal, since we would like the population of cells to reach stable, total, agreement. The next experiment troubleshoots this undesired behaviour.\nQuantifying a qualitative issue is the first step to solving it. We propose a metric to track average cell accuracy, which we define as the mean percentage of cells that have a correct output. We track this metric both before and after mutation.\nIn the figure above, we show the mean percentage of correctly classified pixels in the test set over the course of 400 steps. At step 200, we randomly mutate the digit. Accordingly, we see a brief drop in accuracy as the cells re-organise and eventually come to agreement on what the new digit is.\nWe immediately notice an interesting phenomenon: the cell accuracy appears to decrease over time after the cells have come to an agreement. However, the graph does not necessarily reflect the qualitative issue of unstable labels that we set out to solve. The slow decay in accuracy may be a reflection of the lack of total agreement, but doesn’t capture the stark instability issue.\nInstead of looking at the mean agreement perhaps we should measure total agreement. We define total agreement as the percentage of samples from a given batch wherein all the cells output the same label.\nThis metric does a better job of capturing the issues we are seeing. The total agreement starts at zero and then spikes up to roughly 78%, only to lose more than 10% agreement over the next 100 steps. Again, behaviour after mutation does not appear to be significantly different. Our model is not only unstable in the short term, exhibiting flickering, but is also unstable over longer timescales. As time goes on, cells are becoming less sure of themselves. Let’s inspect the inner states of the CA to see why this is happening.\nThe figure above shows the time evolution of the average magnitude of the state values of active cells (solid line), and the average magnitude of the residual updates (dotted line). Two important things are happening here: 1) the average magnitude of each cell’s internal states is growing monotonically on this timescale; 2) the average magnitude of the residual updates is staying roughly constant. We theorize that, unlike 1), a successful CA model should stabilize the magnitude of its internal states once cells have reached an agreement. In order for this to happen, its residual updates should approach zero over time, unlike what we observed in 2).\nUsing an loss. One problem with cross entropy loss is that it tends to push raw logit values indefinitely higher. Another problem is that two sets of logits can have vastly different values but essentially the same prediction over classes. As such, training the CA with cross-entropy loss neither requires nor encourages a shared reference range for logit values, making it difficult for the cells to effectively communicate and stabilize. Finally, we theorize that large magnitudes in the classification channels may in turn lead the remaining (non-classification) state channels to transition to a high magnitude regime. More specifically, we believe that cross-entropy loss causes unbounded growth in classification logits, which prevents residual updates from approaching zero, which means that neighboring cells continue passing messages to each other even after they reach an agreement. Ultimately, this causes the magnitude of the message vectors to grow unboundedly. With these problems in mind, we instead try training our model with a pixel-wise loss and use one-hot vectors as targets. Intuitively, this solution should be more stable since the raw state channels for classification are never pushed out of the range and a properly classified digit in a cell will have exactly one classification channel set to 1 and the rest to 0. In summary, an loss should decrease the magnitude of all the internal state channels while keeping the classification targets in a reasonable range.\nAdding noise to the residual updates. A number of popular regularization schemes involve injecting noise into a model in order to make it more robust\nThe video above s", "url": "https://wpnews.pro/news/self-classifying-mnist-digits", "canonical_source": "https://distill.pub/2020/selforg/mnist", "published_at": "2020-08-27 20:00:00+00:00", "updated_at": "2026-05-19 23:18:16.006522+00:00", "lang": "en", "topics": ["artificial-intelligence", "machine-learning", "robotics", "research", "science"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/self-classifying-mnist-digits", "markdown": "https://wpnews.pro/news/self-classifying-mnist-digits.md", "text": "https://wpnews.pro/news/self-classifying-mnist-digits.txt", "jsonld": "https://wpnews.pro/news/self-classifying-mnist-digits.jsonld"}}