Coalitional Darwinism and the Instrumental Utility of Individuality

wpnews.pro

This post was written as part of MATS 9.1 under the mentorship of Richard Ngo.

This post is the first of several I will be writing on using natural selection to understand artificial intelligence and agency. This post will show how noisy selection on genome structure can make evolution effectively non-myopic. From this, we give a Darwinian account of the emergence of 'individuals' constituted by coalitions of lower-level replicators.

Later posts will develop the connection between genome-structure selection and neural-network feature-learning, with applications to interpretability and alignment.

Note: I will use 'Darwinism', 'evolution' and '(natural) selection' interchangeably — more technical discussions of selection often distinguish between them.

Natural selection is limited by noise in its ability to resolve small differences in fitness. Because selection is a limited resource, and because selection makes lineages more fit on average, organisms will tend to evolve to make more efficient use of what selective power they have.

The noise floor can make evolution effectively hyperopic (the opposite of 'myopic') by buffering the effects of mutations which may be beneficial in the short-term but harmful in the long-term. The evolution of 'bet-hedging' is given as an example. Hyperopia permits the evolution of pre-commitments detrimental to some individuals.

The 'bowtie' network motif is an example where selectability is an essential property of the architecture. The network develops low-rank structure, limiting its expressivity, in order to be selectable. This may be understood as a 'coalition of the invisible' — network links which otherwise have too little effect to be tunable by selection instead commit themselves to this low rank structure, trading optimality for selectability. The coalition is then entrenched as selection optimizes it.

In prisoners' dilemmae, groups of organisms can evolve obligate co-operation (an inability to 'defect'). In some circumstances, obligate co-operators may evolve policing structures and joint heritability, resulting in the emergence of a new effective unit of selection. The suborganisms constitute the effective genome of the superorganism.

The noise floor limits how effectively subagents can be aligned to the superagent; this imperfect alignment provides the reservoir of variability which permits selection to act on the superagent. Thus subagent misalignment (which is typically only minimally deleterious) is synonymous with 'exploration' in the superagent's genome space. Incoherence (i.e., subagent conflict to the superagent's detriment) is the fitness subsidy paid by the individual to its lineage in the form of typically-slightly-harmful exploration.

Graphical summary

(GPT generated) Summary: Due to randomness, natural selection can only reliably fix or purge mutations with large enough effects on fitness. This resolution is proportional to the (effective) population on which selection acts.

For clarity, we'll consider here species that reproduce in fixed generations, with constant fitnesses; the qualitative picture remains similar regardless. Spherical cow stuff. Consider types of organism; at each generational turnover, they replicate according to a fitness , with the population of type at time . The fraction of the total population represented by type , obeys the replicator equation:

Where is the average fitness of the population at time , , which changes as the proportions change, even as we have assumed the fitnesses constant. This says that the proportion of the population made up by type grows in proportion to how much its fitness exceeds the population mean fitness.

Image and Caption from Wikipedia

In fact this is not exactly what happens — a type organism might perchance have slightly more or less than their allotted average number of descendants . One way to model this effect is to imagine the to be idealized probabilities, and, for a population assumed to have size [1] , we independently draw samples according to the categorical distribution on . I'll spare you some math (see, e.g.,

In effect, this introduces noise with variance . Functionally, if two types have comparable populations and a fitness difference much smaller than the noise scale, the noise makes it impossible to resolve. Concretely, if all types have fitness , except for one type with fitness , selection will only 'see' this difference if

That is, selection has a resolution power proportional to the population size — larger populations mean the noise is averaged out. This is referred to as the drift-diffusion threshold — 'drift' means 'mostly governed by fitness differences', 'diffusion' means 'mostly governed by noise'.

[2] And of course, all of the above is a toy model — isn't exactly the real population, which should change over time anyways, etc. — the point is that noise is important and governs the resolution at which selection can operate.

Evolution is solving a high-dimensional optimization problem, just like neural network training via backprop. In both cases, and I think this is probably a fairly general property of useful high-dimensional optimization algorithms, most directions in genome/parameter space are 'neutral' — they have effectively zero loss.

Two important upshots of this for now: first, if we look at an organism and see lots of complexity, it's not necessarily true that this complexity solves a problem as opposed to being a kludge. Second, this 'non-adaptive complexity' can still serve as the substrate for later adaptive mutations.

For example, in a regime where having long genomes is very cheap, it's mostly fine to have 4 or 5 copies of gene A. If, down the line, one has need of a variant of protein A, it's quite handy to be able to have those extra copies around so you can finetune the variant without messing up the function of the old protein.
[[3]](https://www.lesswrong.com/feed.xml#fn-ra9RSLjyczfzBvqYK-3)

Neutrality is extremely important for studying the behaviour of evolution — it is the basis for a lot of the surprising phenomena we talk about below. Having lots of neutral directions is, by definition, inconsequential to the organism locally — it matters because it changes the nature and difficulty of the search problem in genome space.

See also:

A good lecture on the topic of bet-hedging Imagine the same setup as the coin-toss game: A tribe must choose where to live: the Usually-Edenic Gardens, which give them 10 fitness with probability 99% and 0 fitness with probability 1%, and the Plague-Blasted Heath, which gives them 1.1 fitness with probability 100%. The group maximizes expected fitness by choosing to live in the Gardens. The log growth rate is maximized by choosing the Heath with probability 1. A tribe which chooses the Gardens with any probability will eventually run out of luck and be extinguished in one fell swoop. (For simplicity this assumes the group chooses as a whole, etc etc.)

How sluggishly alight the wings of retribution on those who durst profane the garden? The gardens have an extinction event every ~100 generations, and so on shorter timescales, populations preferring the garden will predominate — selection will only favour the Heath-dwellers on timescales . The interesting consequence is this: if a mutation should cause a Heath-dweller to become a Garden-enjoyer, this mutation is, in the long-run, as good as an instant fatality (assuming the reverse mutation doesn't occur). In the long run, the Heath-dwellers will evolve to prevent such mutations as if they were instantly fatal — evolution should be willing to pay a cost to prevent this mutation.

Cool contrived model. Why care? Roughly, evolution is, in some parameter regime, effectively non-myopic. More spectacular cases of hyperopia in direct conflict with individual interests seem uncommon in biology outside of cases with artificially high mutation rates (like in these contrived models). Bet-hedging is the cleanest, definitely-real, common example. I'm working on a separate post about analogies and disanalogies between evolution and gradient descent, where I'll try to articulate precisely the regimes in which this is relevant based on scale-separation.

[7] "In the long run, we are all [Kelly bettors or] dead." J.M. Keynes

Here I'll be more explicit about the nebulous 'weird parameter regime' in which selection can be effectively 'hyperopic'; we'll build a toy model that will grow into the real multi-scale expansion that I intend to use to apply these ideas to analyze neural networks.

Consider an environment which varies somewhat slower than the timescale of selection — a species which is slow to adapt will suffer a fitness penalty . Evolution should be willing to 'pay' to decrease it — the question is if it can. Supposing a mutation appears at cost to an individual, , then this mutation can survive if (the resolution condition from §1.A). The mutation is favoured if and :

Thus, such amortizations are only possible when the population is small enough as to be unable to resolve them, but large enough to be able to resolve the group effect. Note that this argument applies in reverse to entrenchment effects.

Please note that this is a very heuristic derivation meant to give an idea — I expect that in reality, most such changes are essentially neutral on the short timescale, and are only resolved on the long timescale by second-order effects from changing noise statistics. For changes which aren't neutral, it's likewise a question of comparing timescales and fluctuations. The dynamic version is a good bit more complicated mathematically, and I'll save it for the more technical neural networks post.

Summary: Many biological networks are 'bowtie' shaped — a highly optimized 'core' knot, with many feed-in and feed-out signals, forming a 'regulatory periphery'. The low-rank bottleneck of the 'knot' amplifies the selective signal passed to the peripheral pathways; thus the bowtie can be understood as a 'coalition of the invisible' — many sub-noise-threshold pathways sacrifice tunability by binding themselves to the knot. Selection entrenches the coalition by optimizing the newly legible periphery.

Consider a highly optimized metabolic cycle like the Krebs cycle — the core cycle is essentially optimal), thus highly conserved over evolution. Evolution has to figure out a way to manage its golden boy without walking off a fitness cliff — the solution we often see is the regulatory periphery — the cell's various chemical networks route through the central knot, and all the regulatory knobs get attached to these various input/output channels.

This article shows that bowties tend to evolve when a network processes a low-rank signal under a cost-of-regulation, which seems reasonable — low-rank things (e.g. "how much metabolism should I be doing") are much easier to control and understand — but here I want to propose that one origin of this kind of regulation-cost is the need to preserve selective signal.

Because these regulated networks now pass through the highly-influential central knot, each individual 'arm' of the network has a one-to-many channel to downstream processes, instead of the one-to-one you'd get in a haphazardly evolved chemical network. Thus, the feed-in / feed-out pathways get a bonus to selective visibility by passing through the knot — even at the expense of not having fine-grained control over the output processes. The bow-tie motif is a coalition of chemical pathways — its members trade off, and the stability of that coalition is determined by the noise floor of selection (see §1)! AND! The longer this coalition holds, the more highly optimized it will become, so selection will tend to entrench it over time. I call this kind of selectability-driven coalition a 'coalition of the invisible'.

[8] I have been toying with the idea that the fragility of the bowtie core was not an accidental property of an optimum, but was perhaps actively evolved in order to divert selection resolution from potential mutants — rather than waste selective resolution re-purging every time there's a mutant lineage, selection might prefer mutants to 'fail-fast' by building fitness cliffs — an evolved pre-commitment to self-destruction. I'd take ~30% that its fragility is evolved hyperopically rather than incidental. This paper finds evolved fragility in silico, but this doesn't seem strong evidence for evolved fragility; my guess is that recombination evolves to cope with the mutational load before evolved fragility kicks in.

This diagram shows the dynamics of evolution in asexual reproduction. The red 'aB' strand represents a deleterious mutation. The 'evolved fragility' discussed above would occur if 'aB' repeatedly evolved and went extinct; you can think of the 'lost selective power' as the area of 'aB''s red blob. The value of that area depends on how valuable the marginal unit of selection power is in a given situation.

Summary: Under prisoners' dilemma dynamics, organisms may evolve obligate co-operation — a form of pre-commitment. Policing and excluding defectors incentivises tighter co-operation. Under the right circumstances, the co-operating replicators may become bound together so tightly as to constitute an effective individual — a new unit of selection.

Here I will discuss dynamics of how evolution can push groups of possibly unrelated 'individuals' to bind themselves into a single, jointly reproducing unit — a 'higher level' organism. A quick clarificatory note: here it is useful to think in terms of agents and sub-agents, but 'actions' are not taken in the lifetime of, say, some particular gene; instead the 'actions' are successive adaptations occurring over evolutionary timescales. We may say synonymously 'bacterium A evolved X capability', 'the lineage of bacteria A evolve X capability', or 'evolution caused those bacteria of lineage A which evolved X capability to outcompete those which did not'. I prefer the former as being concise, but the squeamish reader is invited to mentally substitute readings with their preferred degree of teleology.

Consider some organisms playing prisoners' dilemma, perhaps scattered over enough isolated 'patches' that we get lots of independent trials. Defectors prosper within patches, while co-operating patches thrive overall. We saw above (§2) that evolution can 'pre-commit' lineages to certain paths of development; it follows that it can, given enough incentive, evolve obligate co-operation — biologically cauterizing away the possibility of defection.

In order for this to be feasible, a patch needs to evolve the ability to:

These synergize with each other: exclusion usually means compartmentalization, which increases an individual replicator's incentive to police. [9] 'Policing against mutant defectors from our posited prisoners' dilemma' looks a lot like 'policing defectors in general', which, turned inward, looks a lot like 'pre-commit to joint heritability of all replicators within the patch'. This is not the necessary course of events, but a proof of plausibility.

Once these three conditions apply, the compartment becomes a unit of selection, and its constituent replicators something like its effective genome. Natural selection will now operate on the compartments as a unit, and an 'individual' is born. I will refer to the units of this higher level of selection as 'agent' and its constituents as 'subagents'.

Two fun/surprising questions as exercises: when dioecy evolved, why would females entirely lose the ability to reproduce parthenogenetically, rather than selectively? If your answer is 'policing', in line with this section, why are sex ratios not so policed? (c.f. Fisher's Principle)

Group Selection Wikipedia Page

Coalitional Darwinism is subtly but importantly different from traditional 'group selection'. Group selection is often used to mean the selection of groups in direct opposition to the interests of individuals — notably, its Wikipedia page has a "Rejection" section, though it seems to largely be a question of when and where group selection occurs rather than if — I happen to share a name with a very handsome and smart and cool group of cells on which selection is acting as a unit.

Two mechanical differences:

This has an interesting interpretation: sub-agents' directions of neutral variation are the variance on which the superagent draws to optimize — but the converse is that superagents must become more 'Draconian' in proportion to the paucity of accessible variability remaining.

[12] So 'coalitional Darwinism' as articulated here is group selection, but it's not 'Group Selection' in the 'is this a dominant mechanism in animals' sense — it's ultimately an attempt to cast the biological theory of the emergence of individuals-qua-superagents as a naturalized theory of agency and incoherence. Bluntly, I want it to apply to circuits in Claude.

See also:

['Selection Processes for Subagents'](https://www.lesswrong.com/posts/d4hw4FBX9YXHGFBWQ/selection-processes-for-subagents)

['Understanding Systematization' (Sequence)](https://www.alignmentforum.org/s/MGMwqENAgdi85fiwF)

Summary: The selective pressure which creates policed individuals does not run to completion by design. Constituents can compete in orthogonal or slightly-deleterious ways; this competition generates the variety on which selection acts. Incoherence may be understood as a pre-commitment to exploration of genome-space, preserved against naive optimization by the noise floor.

As in the previous section, I will note here that we speak of agents and subagents acting volitionally — as before this is an abstraction for thinking about the tendencies induced by evolution. Subagents' 'actions' are mutations. This section introduces the additional complication that the 'agent' and 'subagents' may evolve, and thus 'act' on different timescales; for simplicity, I will speak assuming that the two levels' generations 'line up' — e.g., the genes which make up a cell reproduce when the cell does. This could fail to be true, e.g., for endosymbionts like the mitochondria.

Our theme hitherto has been the need to allocate limited selection power. From the above, note that an individual (the emergent agent of §4) will have a lower effective population, and will typically reproduce on a significantly longer timescale — thus, only a limited amount of selection power can be brought to bear on the subagents should they start to get uppity. Competition among the regulators may persist to the extent that the selection acting on the agent is too weak to resolve it. I call this room for suboptimality 'selective slack'.

As a toy model, consider a subagent able to gain fitness at the cost of inflicting fitness damage to the host — then there is a regime where this defection can evolve and persist:

This tells us directly how much the subagents must be hobbled-by-design: A 'parliament', dividing power equally, might have , a 'veto' system would have , while complete redundancy of the replicators might make (at least in the short term, before the others wise up). Tall-poppy syndrome in action: the more say an individual subagent has in constituting the host, the more intense must be the policing.

The upshot is that 'incoherence', in the narrow sense of persistent maladaptation of an agent created specifically by conflict among its subagents, by default, persists to a greater or lesser degree in proportion to the selection pressure brought to bear on a coalitional agent. It is not the case that coherence increases monotonically with applied optimization pressure — coherence is selected for only insofar as it is on the Pareto frontier.

[13] "[Incoherence] is the worst form of [exploration] except for all the others we have tried" — W. S. Churchill

Much of the above has been about treating 'selectability' as a resource. To operate, selection requires variation — BUT! selection is an optimizer, so it obviously should be very upset about needing to maintain a reservoir of harmful variants for a rainy day. How does it leave itself enough slack?

Instead, we observe a highly non-trivial thing about high-dimensional optimization: the predominance of 'neutral' or 'cryptic' variation (cf. §1.B). I'll talk more about this in a follow-up post, as it relates to neural networks as well. Suffice for now to say that this pressures the genome to be 'robust' in the sense that most mutations have small or no effect locally while changing the effects of future mutations. This amounts to a population spreading itself out along a flat minimum in the fitness landscape — that is, evolution, like SGD, prefers flat minima ;)

In the context of emergent, higher-level 'individuals', whose 'genomes' are their constituent subagents, neutral variation means policing only those interactions which directly affect the function of the higher level organism, actively disincentivizing overly Draconian policing amongst the subagents.

Now we can see why the noise floor argument from before (see §1) was so important: the noise floor is the mechanism by which the subagents maintain (lightly harmful) variability. This is what is known in biology as 'non-adaptive complexity' [14] ; the most delightful example is introns, parasite DNA sequences which evolved the machinery needed to splice DNA — these are the likely ancestors of a bunch of the more advanced splicing technology used by the DNA.

A quick cool tangent I couldn't bear to cut: This paper gives two mechanisms by which parasitic agents drive higher-level evolution. First, it pushes co-operator communities to co-ordinate more tightly in order to gain a complexity-asymmetry, and this tighter co-ordination translates more readily to obligate co-replication.

Second, the adversarial co-evolution of parasitic elements, combined with a countervailing pressure to become 'domesticated', results in something like a technology-transfer, as in the intron example above. Thus, in the long-run, even parasites can be adaptive. For a fun example see this paper on group 2 introns.

In this post I have tried to articulate five core ideas:

The skeptical reader has hopefully been gnashing their teeth that it seems that selection-for-selectability has multiplied the traditional criticism of selectionist explanations — now we can explain apparently maladaptive behaviours as selectionist too! "Parasites are good actually" — the nerve!

I claim only that these selection-for-selectability effects in principle can exist; I would not be so surprised if any of the examples I've adduced are better attributed to other causes, though I tried to pick clean examples. But another post will link a bunch of papers on genome and neural network structure, and we can tease out which parts of the story are bio-specific once we have the math under control.

To me, this is all an elaborate metaphor for neural networks [15] , for which I can just work out the math — I'll derive the correspondences and the parameter regimes (basically it falls out of dominant balance and multiscale perturbation theory, if that tickles your imagination any). Until then, salve atque vale.

My gratitude to Ashe Vasquez Nunez, Richard Ngo, Marcel Mroczek, and Maria Kostylew for their feedback.

"No Man is wise at all Times, or is without his blind Side" — Erasmus

the 'e' subscript stands for 'its more complicated than that' ↩︎ Yes, this is the exact opposite of what physicists mean when they say 'drift', and yes it is completely standard convention in biology. ↩︎

This particular pattern is called ['subfunctionalization'](https://en.wikipedia.org/wiki/Subfunctionalization). [↩︎](https://www.lesswrong.com/feed.xml#fnref-ra9RSLjyczfzBvqYK-3)

The noise floor for selection provides a kind of 'slack' (c.f. [this post](https://www.lesswrong.com/posts/yLLkWMDbC9ZNKbjDG/slack)). [↩︎](https://www.lesswrong.com/feed.xml#fnref-ra9RSLjyczfzBvqYK-4)

See the post ["The Correct Response to Uncertainty is Not Half-Speed"](https://www.lesswrong.com/posts/FMkQtPvzsriQAow5q/the-correct-response-to-uncertainty-is-not-half-speed) for a related idea.

Garrabrant's Geometric Rationality has a lot of cool adjacent ideas, including the especially salient idea of the 'arithmetic-geometric boundary'. The nexus is that population-sizes under natural selection are the bankrolls of Kelly bettors. Kelly betting definitionally maximizes the geometric growth rate, i.e., . ↩︎

One may be reminded of work on optimization daemons and gradient hacking (see also here and here); my take is that hyperopia is a property of the optimizer, so this analysis doesn't relate directly to mesa-optimizers. If you're interested in the care and feeding of benign mesa-optimizers, do consider studying the immune system (see my quick-take here). ↩︎

Note that we are speaking of these regulatory arms as agents which act volitionally; this is loose, both in the sense that they only 'act' as driven over evolutionary timescales, not intra-lifetime, and in the sense that one could only with difficulty identify persistent 'lineages' of arms. This is mainly a propaedeutic bridge to the next section, where it will be useful to think of organisms as agents consisting of a genome of sub-agents. ↩︎

Policing is important in proportion to the degree of unrelatedness of the constituent replicators, c.f. this paper — the reader is encouraged to ponder, i.a., royal marriages and the practice of cross-shareholding common among Japan's zaibatsu conglomerates, keiretsu (系列). ↩︎

A book I haven't read but seems good for this: Kirschner & Gerhart 2005, The Plausibility of Life: Resolving Darwin's Dilemma.

If all replicators were identical, they could do a crude Lobian-style co-operation — multicellular organisms and insect colonies seem to do roughly this. As noted earlier, relatedness decreases the need for policing mechanisms. ↩︎ A caricature of the cold war: the US started under less pressure, so could afford free markets, which are less efficient at producing strategic goods in the short term, but produce more growth in the long-term. The USSR started at a disadvantage, resorted to state control of the economy, and was able to make remarkable gains in strategic sectors in the short-term, but grew less in the long term. Slack at the higher and lower levels reinforce one another. ↩︎

'on the Pareto frontier' follows if you assume selection is greedy over accessible adaptations. c.f. Ngo's Understanding systematization sequence

Lynch, The Origins of Genome Architecture is the flagbearer of this idea; the parts I skimmed of this seemed great, but I have only skimmed.

As a teaser for flavour, I've been thinking what 'policing' might look like if we imagine 'competition between circuits' in LLMs. Perhaps 'policing' looks like the development of anomaly detection and other 'executive function' circuits, possibly even leading to introspective ability? ↩︎

source & further reading

lesswrong.com — original article 7 random thoughts on training Buddhist AI OpenAI Models Behind HuggingFace Cybersecurity Incident Steering Blackmail Through a Model's "Emotional State"

Coalitional Darwinism and the Instrumental Utility of Individuality

Run your AI side-project on zahid.host