{"slug": "growing-neural-cellular-automata", "title": "Growing Neural Cellular Automata", "summary": "The article explores morphogenesis, the self-organizing process by which a single cell develops into a complex multicellular organism, highlighting its robustness and regenerative capabilities. It discusses the challenge of understanding the \"algorithm\" that guides cell collectives to build and repair specific anatomical structures. To investigate this, the authors propose using differentiable Neural Cellular Automata as an in silico model to replicate biological self-assembly and regeneration.", "body_md": "[Distill](/)\n\nDifferentiable Model of Morphogenesis\n\nMost multicellular organisms begin their life as a single egg cell - a\nsingle cell whose progeny reliably self-assemble into highly complex\nanatomies with many organs and tissues in precisely the same arrangement\neach time. The ability to build their own bodies is probably the most\nfundamental skill every living creature possesses. Morphogenesis (the\nprocess of an organism’s shape development) is one of the most striking\nexamples of a phenomenon called *self-organisation*. Cells, the tiny\nbuilding blocks of bodies, communicate with their neighbors to decide the\nshape of organs and body plans, where to grow each organ, how to\ninterconnect them, and when to eventually stop. Understanding the interplay\nof the emergence of complex outcomes from simple rules and\nhomeostatic\n\nThis process is extremely robust to perturbations. Even when the organism is fully developed, some species still have the capability to repair damage - a process known as regeneration. Some creatures, such as salamanders, can fully regenerate vital organs, limbs, eyes, or even parts of the brain! Morphogenesis is a surprisingly adaptive process. Sometimes even a very atypical development process can result in a viable organism - for example, when an early mammalian embryo is cut in two, each half will form a complete individual - monozygotic twins!\n\nThe biggest puzzle in this field is the question of how the cell collective\nknows what to build and when to stop. The sciences of genomics and stem cell\nbiology are only part of the puzzle, as they explain the distribution of\nspecific components in each cell, and the establishment of different types\nof cells. While we know of many genes that are *required* for the\nprocess of regeneration, we still do not know the algorithm that is\n*sufficient* for cells to know how to build or remodel complex organs\nto a very specific anatomical end-goal. Thus, one major lynch-pin of future\nwork in biomedicine is the discovery of the process by which large-scale\nanatomy is specified within cell collectives, and how we can rewrite this\ninformation to have rational control of growth and form. It is also becoming\nclear that the software of life possesses numerous modules or subroutines,\nsuch as “build an eye here”, which can be activated with simple signal\ntriggers\n\nImagine if we could design systems of the same plasticity and robustness as\nbiological life: structures and machines that could grow and repair\nthemselves. Such technology would transform the current efforts in\nregenerative medicine, where scientists and clinicians seek to discover the\ninputs or stimuli that could cause cells in the body to build structures on\ndemand as needed. To help crack the puzzle of the morphogenetic code, and\nalso exploit the insights of biology to create self-repairing systems in\nreal life, we try to replicate some of the desired properties in an\n*in silico* experiment.\n\nThose in engineering disciplines and researchers often use many kinds of simulations incorporating local interaction, including systems of partial derivative equation (PDEs), particle systems, and various kinds of Cellular Automata (CA). We will focus on Cellular Automata models as a roadmap for the effort of identifying cell-level rules which give rise to complex, regenerative behavior of the collective. CAs typically consist of a grid of cells being iteratively updated, with the same set of rules being applied to each cell at every step. The new state of a cell depends only on the states of the few cells in its immediate neighborhood. Despite their apparent simplicity, CAs often demonstrate rich, interesting behaviours, and have a long history of being applied to modeling biological phenomena.\n\nLet’s try to develop a cellular automata update rule that, starting from a single cell, will produce a predefined multicellular pattern on a 2D grid. This is our analogous toy model of organism development. To design the CA, we must specify the possible cell states, and their update function. Typical CA models represent cell states with a set of discrete values, although variants using vectors of continuous values exist. The use of continuous values has the virtue of allowing the update rule to be a differentiable function of the cell’s neighbourhood’s states. The rules that guide individual cell behavior based on the local environment are analogous to the low-level hardware specification encoded by the genome of an organism. Running our model for a set amount of steps from a starting configuration will reveal the patterning behavior that is enabled by such hardware.\n\nSo - what is so special about differentiable update rules? They will allow us to use the powerful language of loss functions to express our wishes, and the extensive existing machinery around gradient-based numerical optimization to fulfill them. The art of stacking together differentiable functions, and optimizing their parameters to perform various tasks has a long history. In recent years it has flourished under various names, such as (Deep) Neural Networks, Deep Learning or Differentiable Programming.\n\nWe will represent each cell state as a vector of 16 real values (see the figure above). The first three channels represent the cell color visible to us (RGB). The target pattern has color channel values in range and an equal to 1.0 for foreground pixels, and 0.0 for background.\n\nThe alpha channel () has a special meaning: it demarcates living cells, those belonging to the pattern being grown. In particular, cells having and their neighbors are considered “living”. Other cells are “dead” or empty and have their state vector values explicitly set to 0.0 at each time step. Thus cells with can be thought of as “mature”, while their neighbors with are “growing”, and can become mature if their alpha passes the 0.1 threshold.\n\nHidden channels don’t have a predefined meaning, and it’s up to the update rule to decide what to use them for. They can be interpreted as concentrations of some chemicals, electric potentials or some other signaling mechanism that are used by cells to orchestrate the growth. In terms of our biological analogy - all our cells share the same genome (update rule) and are only differentiated by the information encoded the chemical signalling they receive, emit, and store internally (their state vectors).\n\nNow it’s time to define the update rule. Our CA runs on a regular 2D grid of 16-dimensional vectors, essentially a 3D array of shape [height, width, 16]. We want to apply the same operation to each cell, and the result of this operation can only depend on the small (3x3) neighborhood of the cell. This is heavily reminiscent of the convolution operation, one of the cornerstones of signal processing and differential programming. Convolution is a linear operation, but it can be combined with other per-cell operations to produce a complex update rule, capable of learning the desired behaviour. Our cell update rule can be split into the following phases, applied in order:\n\n**Perception.** This step defines what each cell perceives of\nthe environment surrounding it. We implement this via a 3x3 convolution with\na fixed kernel. One may argue that defining this kernel is superfluous -\nafter all we could simply have the cell learn the requisite perception\nkernel coefficients. Our choice of fixed operations are motivated by the\nfact that real life cells often rely only on chemical gradients to guide the\norganism development. Thus, we are using classical Sobel filters to estimate\nthe partial derivatives of cell state channels in the and\ndirections, forming a 2D gradient vector in each direction, for\neach state channel. We concatenate those gradients with the cells own\nstates, forming a dimensional *perception vector*, or\nrather *percepted vector, *for each cell.\n\ndef perceive(state_grid):\n\nsobel_x = [[-1, 0, +1],\n\n[-2, 0, +2],\n\n[-1, 0, +1]]\n\nsobel_y = transpose(sobel_x)\n\n# Convolve sobel filters with states\n\n# in x, y and channel dimension.\n\ngrad_x = conv2d(sobel_x, state_grid)\n\ngrad_y = conv2d(sobel_y, state_grid)\n\n# Concatenate the cell’s state channels,\n\n# the gradients of channels in x and\n\n# the gradient of channels in y.\n\nperception_grid = concat(\n\nstate_grid, grad_x, grad_y, axis=2)\n\nreturn perception_grid\n\n**Update rule.** Each cell now applies a series of operations\nto the perception vector, consisting of typical differentiable programming\nbuilding blocks, such as 1x1-convolutions and ReLU nonlinearities, which we\ncall the cell’s “update rule”. Recall that the update rule is learned, but\nevery cell runs the same update rule. The network parametrizing this update\nrule consists of approximately 8,000 parameters. Inspired by residual neural\nnetworks, the update rule outputs an incremental update to the cell’s state,\nwhich applied to the cell before the next time step. The update rule is\ndesigned to exhibit “do-nothing” initial behaviour - implemented by\ninitializing the weights of the final convolutional layer in the update rule\nwith zero. We also forego applying a ReLU to the output of the last layer of\nthe update rule as the incremental updates to the cell state must\nnecessarily be able to both add or subtract from the state.\n\ndef update(perception_vector):\n\n# The following pseudocode operates on\n\n# a single cell’s perception vector.\n\n# Our reference implementation uses 1D\n\n# convolutions for performance reasons.\n\nx = dense(perception_vector, output_len=128)\n\nx = relu(x)\n\nds = dense(x, output_len=16, weights_init=0.0)\n\nreturn ds\n\n**Stochastic cell update.** Typical cellular automata update\nall cells simultaneously. This implies the existence of a global clock,\nsynchronizing all cells. Relying on global synchronisation is not something\none expects from a self-organising system. We relax this requirement by\nassuming that each cell performs an update independently, waiting for a\nrandom time interval between updates. To model this behaviour we apply a\nrandom per-cell mask to update vectors, setting all update values to zero\nwith some predefined probability (we use 0.5 during training). This\noperation can be also seen as an application of per-cell dropout to update\nvectors.\n\ndef stochastic_update(state_grid, ds_grid):\n\n# Zero out a random fraction of the updates.\n\nrand_mask = cast(random(64, 64) < 0.5, float32)\n\nds_grid = ds_grid * rand_mask\n\nreturn state_grid + ds_grid\n\n**Living cell masking.** We want to model the growth process\nthat starts with a single cell, and don’t want empty cells to participate in\ncomputations or carry any hidden state. We enforce this by explicitly\nsetting all channels of empty cells to zeros. A cell is considered empty if\nthere is no “mature” (alpha>0.1) cell in its 3x3 neightborhood.\n\ndef alive_masking(state_grid):\n\n# Take the alpha channel as the measure of “life”.\n\nalive = max_pool(state_grid[:, :, 3], (3,3)) > 0.1\n\nstate_grid = state_grid * cast(alive, float32)\n\nreturn state_grid\n\nIn our first experiment, we simply train the CA to achieve a target image after a random number of updates. This approach is quite naive and will run into issues. But the challenges it surfaces will help us refine future attempts.\n\nWe initialize the grid with zeros, except a single seed cell in the center,\nwhich will have all channels except RGB\n\nOnce the optimisation converges, we can run simulations to see how our learned CAs grow patterns starting from the seed cell. Let’s see what happens when we run it for longer than the number of steps used during training. The animation below shows the behaviour of a few different models, trained to generate different emoji patterns.\n\nWe can see that different training runs can lead to models with drastically different long term behaviours. Some tend to die out, some don’t seem to know how to stop growing, but some happen to be almost stable! How can we steer the training towards producing persistent patterns all the time?\n\nOne way of understanding why the previous experiment was unstable is to draw a parallel to dynamical systems. We can consider every cell to be a dynamical system, with each cell sharing the same dynamics, and all cells being locally coupled amongst themselves. When we train our cell update model we are adjusting these dynamics. Our goal is to find dynamics that satisfy a number of properties. Initially, we wanted the system to evolve from the seed pattern to the target pattern - a trajectory which we achieved in Experiment 1. Now, we want to avoid the instability we observed - which in our dynamical system metaphor consists of making the target pattern an attractor.\n\nOne strategy to achieve this is letting the CA iterate for much longer time and periodically applying the loss against the target, training the system by backpropagation through these longer time intervals. Intuitively we claim that with longer time intervals and several applications of loss, the model is more likely to create an attractor for the target shape, as we iteratively mold the dynamics to return to the target pattern from wherever the system has decided to venture. However, longer time periods substantially increase the training time and more importantly, the memory requirements, given that the entire episode’s intermediate activations must be stored in memory for a backwards-pass to occur.\n\nInstead, we propose a “sample pool” based strategy to a similar effect. We define a pool of seed states to start the iterations from, initially filled with the single black pixel seed state. We then sample a batch from this pool which we use in our training step. To prevent the equivalent of “catastrophic forgetting” we replace one sample in this batch with the original, single-pixel seed state. After concluding the training step , we replace samples in the pool that were sampled for the batch with the output states from the training step over this batch. The animation below shows a random sample of the entries in the pool every 20 training steps.\n\ndef pool_training():\n\n# Set alpha and hidden channels to (1.0).\n\nseed = zeros(64, 64, 16)\n\nseed[64//2, 64//2, 3:] = 1.0\n\ntarget = targets[‘lizard’]\n\npool = [seed] * 1024\n\nfor i in range(training_iterations):\n\nidxs, batch = pool.sample(32)\n\n# Sort by loss, descending.\n\nbatch = sort_desc(batch, loss(batch))\n\n# Replace the highest-loss sample with the seed.\n\nbatch[0] = seed\n\n# Perform training.\n\noutputs, loss = train(batch, target)\n\n# Place outputs back in the pool.\n\npool[idxs] = outputs\n\nEarly on in the training process, the random dynamics in the system allow the model to end up in various incomplete and incorrect states. As these states are sampled from the pool, we refine the dynamics to be able to recover from such states. Finally, as the model becomes more robust at going from a seed state to the target state, the samples in the pool reflect this and are more likely to be very close to the target pattern, allowing the training to refine these almost completed patterns further.\n\nEssentially, we use the previous final states as new starting points to force our CA to learn how to persist or even improve an already formed pattern, in addition to being able to grow it from a seed. This makes it possible to add a periodical loss for significantly longer time intervals than otherwise possible, encouraging the generation of an attractor as the target shape in our coupled system. We also noticed that reseeding the highest loss sample in the batch, instead of a random one, makes training more stable at the initial stages, as it helps to clean up the low quality states from the pool.\n\nHere is what a typical training progress of a CA rule looks like. The cell rule learns to stabilize the pattern in parallel to refining its features.\n\nIn addition to being able to grow their own bodies, living creatures are great at maintaining them. Not only does worn out skin get replaced with new skin, but very heavy damage to complex vital organs can be regenerated in some species. Is there a chance that some of the models we trained above have regenerative capabilities?\n\nThe animation above shows three different models trained using the same settings. We let each of the models develop a pattern over 100 steps, then damage the final state in five different ways: by removing different halves of the formed pattern, and by cutting out a square from the center. Once again, we see that these models show quite different out-of-training mode behaviour. For example “the lizard” develops quite strong regenerative capabilities, without being explicitly trained for it!\n\nSince we trained our coupled system of cells to generate an attractor towards a target shape from a single cell, it was likely that these systems, once damaged, would generalize towards non-self-destructive reactions. That’s because the systems were trained to grow, stabilize, and never entirely self-destruct. Some of these systems might naturally gravitate towards regenerative capabilities, but nothing stops them from developing different behaviors such as explosive mitoses (uncontrolled growth), unresponsiveness to damage (overstabilization), or even self destruction, especially for the more severe types of damage.\n\nIf we want our model to show more consistent and accurate regenerative capabilities, we can try to increase the basin of attraction for our target pattern - increase the space of cell configurations that naturally gravitate towards our target shape. We will do this by damaging a few pool-sampled states before each training step. The system now has to be capable of regenerating from states damaged by randomly placed erasing circles. Our hope is that this will generalize to regenerational capabilities from various types of damage.\n\nThe animation above shows training progress, which includes sample damage. We sample 8 states from the pool. Then we replace the highest-loss sample (top-left-most in the above) with the seed state, and damage the three lowest-loss (top-right-most) states by setting a random circular region within the pattern to zeros. The bottom row shows states after iteration from the respective top-most starting state. As in Experiment 2, the resulting states get injected back into the pool.\n\nAs we can see from the animation above, models that were exposed to damage during training are much more robust, including to types of damage not experienced in the training process (for instance rectangular damage as above).\n\nAs previously described, we model the cell’s perception of its neighbouring cells by estimating the gradients of state channels in and using Sobel filters. A convenient analogy is that each agent has two sensors (chemosensory receptors, for instance) pointing in orthogonal directions that can sense the gradients in the concentration of certain chemicals along the axis of the sensor. What happens if we rotate those sensors? We can do this by rotating the Sobel kernels.\n\nThis simple modification of the perceptive field produces rotated versions of the pattern for an angle of choosing without retraining as seen below.\n\nIn a perfect world, not quantized by individual cells in a pixel-lattice, this would not be too surprising, as, after all, one would expect the perceived gradients in and to be invariant to the chosen angle - a simple change of frame of reference. However, it is important to note that things are not as simple in a pixel based model. Rotating pixel based graphics involves computing a mapping that’s not necessarily bijective and classically involves interpolating between pixels to achieve the desired result. This is because a single pixel, when rotated, will now likely overlap several pixels. The successful growth of patterns as above suggests a certain robustness to the underlying conditions outside of those experienced during training.\n\nThere exists an extensive body of literature that describes the various\nflavours of cellular automata and PDE systems, and their applications to\nmodelling physical, biological or even social systems. Although it would be\nimpossible to present a just overview of this field in a few lines, we will\ndescribe some prominent examples that inspired this work. Alan Turing\nintroduced his famous Turing patterns back in 1952\n\nEver since von Neumann introduced CAs\n*A New Kind of Science”*\n\nMore recently, several researchers generalized Conway’s Game of life to work\non more continuous domains. We were particularly inspired by Rafler’s\nSmoothLife\n\nA number of researchers have used evolutionary algorithms to find CA rules\nthat reproduce predefined simple patterns\n\nThe close relation between Convolutional Neural Networks and Cellular\nAutomata has already been observed by a number of researchers\n\nThe Neural GPU\n\nLooking more broadly, we think that the concept of self-organisation is\nfinding its way into mainstream machine learning with popularisation of\nGraph Neural Network\n\nSelf-organization also appeared in fascinating contemporary work using more\ntraditional dynamic graph networks, where the authors evolved\nSelf-Assembling Agents to solve a variety of virtual tasks\n\nOne of the most remarkable demonstrations of the power of self-organisation\nis when it is applied to swarm modeling. Back in 1987, Reynolds’ Boids\n\nThis article describes a toy embryogenesis and regeneration model. This is a\nmajor direction for future work, with many applications in biology and\nbeyond. In addition to the implications for understanding the evolution and\ncontrol of regeneration, and harnessing this understanding for biomedical\nrepair, there is the field of bioengineering. As the field transitions from\nsynthetic biology of single cell collectives to a true synthetic morphology\nof novel living machines\n\nThe models described in this article run on the powerful GPU of a modern computer or a smartphone. Yet, let’s speculate about what a “more physical” implementation of such a system could look like. We can imagine it as a grid of tiny independent computers, simulating individual cells. Each of those computers would require approximately 10Kb of ROM to store the “cell genome”: neural network weights and the control code, and about 256 bytes of RAM for the cell state and intermediate activations. The cells must be able to communicate their 16-value state vectors to neighbors. Each cell would also require an RGB-diode to display the color of the pixel it represents. A single cell update would require about 10k multiply-add operations and does not have to be synchronised across the grid. We propose that cells might wait for random time intervals between updates. The system described above is uniform and decentralised. Yet, our method provides a way to program it to reach the predefined global state, and recover this state in case of multi-element failures and restarts. We therefore conjecture this kind of modeling may be used for designing reliable, self-organising agents. On the more theoretical machine learning front, we show an instance of a decentralized model able to accomplish remarkably complex tasks. We believe this direction to be opposite to the more traditional global modeling used in the majority of contemporary work in the deep learning field, and we hope this work to be an inspiration to explore more decentralized learning modeling.\n\nWe would like to thank Blaise Aguera y Arcas for his support, as well as for\nteasing our work in his excellent 2019 talk at NeurIPS\n\nOn the Distill side, we are especially grateful to Chris Olah for reviewing the article draft, insightful comments on text and diagrams, and general support of the publication.\n\n**Research:** Alexander came up with the Self-Organising\nAsynchronous Neural Cellular Automata model and Ettore contributed to its\ndesign. Ettore designed and performed most of the experiments for this work.\nAlexander supervised the entire process and contributed extensively to the\nlater stages of development by performing experiments and refining the\nmodel.\n\nThe idea of applying neural networks to understanding regeneration, and to\ndesigning self-organising systems, was proposed by Michael Levin in his\nemail to Alexander, that was sent following the DeepDream\n\n**Demos:** Alexander created both the WebGL and the tf.js demo.\nEttore contributed to the tf.js demo.\n\n**Writing & Diagrams:** Alexander outlined the structure of the\narticle, and contributed to the content throughout. Ettore contributed to\nthe content throughout. Eyvind drew all the diagrams, contributed to the\ncontent throughout, and wrote all of the pseudocode. Michael made extensive\ncontributions to the article text, providing the biological context and\nmotivation for this work.\n\n**WebGL playground.** Starting from our first experiments on\nNeural CA growth and regeneration, we wanted to challenge our models with\nnew situations not seen during training, like removing large portions of the\npattern, or seeding multiple instances side-by-side. To facilitate\nexploration and sharing of our models, we created a TensorFlow.js playground\nthat allowed us to interact with trained models right in a browser. The code\nfor exporting and loading CA models in TF.js format is available in the\naccompanying Colab notebook.\n\nWhile writing this article, we decided to see how far one can push the\nperformance and portability of this interactive playground. We reimplemented\nall necessary operations from scratch using the WebGL API and GLSL shader\nlanguage. This implementation powers the demo that can be found on the top\nof this page. We decided to quantize all model parameters and\nactivations\n\nThe quantization was largely an afterthought, and was not accounted for during training. That’s why there are slight differences in models’ behaviours between the online demo and the Python version. However, most of the CAs that we’ve trained managed to survive the somewhat draconic quantization without severe artifacts, although in a few cases we had to resort to selecting the best model checkpoint between a few training runs.\n\n**Colaboratory Notebook.** All of the experiments, images and\nvideos in this article can be recreated using the single notebook referenced\nat the beginning of the article. Images have a “Recreate in Colab” button\nwhich brings you to the corresponding cell that generated the image. Our\nreference implementation of the Neural CA was written while striving to be\nas concise and simple as possible and thus foregoes many performance\noptimizations and tricks one could implement. For the core of the CA - the\nneural network parametrizing the update rule - the full code is contained in\nthe tf.keras.Model NeuralCA class. Note that this network consists of just\n8.3K parameters - minute by most standards and we suspect it could be\nminimized further employing pruning or other forms of compression. The\nupdate loop consists of a native python loop iteratively applying the\naforementioned update function, and making use of various techniques we’ve\ndescribed in the article, such as having a sample pool and applying damage\nto the starting seeds. The rest of the notebook consists of code to generate\nand visualize the various images and videos employed in this article,\nutilizing models pre-trained by us using this very same colab. These\npre-trained models can be easily recreated in a matter of minutes with a\ncurrent generation GPU or one provided for free in Colab.\n\nIf you see mistakes or want to suggest changes, please [create an issue on GitHub](https://github.com/distillpub/post--growing-ca/issues/new).\n\nDiagrams and text are licensed under Creative Commons Attribution [CC-BY 4.0](https://creativecommons.org/licenses/by/4.0/) with the [source available on GitHub](https://github.com/distillpub/post--growing-ca), unless noted otherwise. The figures that have been reused from other sources don’t fall under this license and can be recognized by a note in their caption: “Figure from …”.\n\nFor attribution in academic contexts, please cite this work as\n\n```\nMordvintsev, et al., \"Growing Neural Cellular Automata\", Distill, 2020.\n```\n\nBibTeX citation\n\n```\n@article{mordvintsev2020growing,\n  author = {Mordvintsev, Alexander and Randazzo, Ettore and Niklasson, Eyvind and Levin, Michael},\n  title = {Growing Neural Cellular Automata},\n  journal = {Distill},\n  year = {2020},\n  note = {https://distill.pub/2020/growing-ca},\n  doi = {10.23915/distill.00023}\n}\n```\n\n", "url": "https://wpnews.pro/news/growing-neural-cellular-automata", "canonical_source": "https://distill.pub/2020/growing-ca/", "published_at": "2026-05-17 19:21:09+00:00", "updated_at": "2026-05-19 22:04:56.029857+00:00", "lang": "en", "topics": ["research", "science", "artificial-intelligence", "machine-learning", "robotics"], "entities": [], "alternates": {"html": "https://wpnews.pro/news/growing-neural-cellular-automata", "markdown": "https://wpnews.pro/news/growing-neural-cellular-automata.md", "text": "https://wpnews.pro/news/growing-neural-cellular-automata.txt", "jsonld": "https://wpnews.pro/news/growing-neural-cellular-automata.jsonld"}}