And other implications of adaptivity
Every high school biology class enters a certain phase where the students learn that genomic complexity is almost completely uncorrelated with (at least superficial) phenotypic complexity. This phase is usually heralded by a somewhat arbitrary list of species’ genome lengths.
Students are often shocked that the lowly bullfrog has almost exactly twice as much information in its genome, while the fern has over fifty times as much. I was definitely surprised at first when I read this factoid. But if you think about it for longer than about 20 seconds, it makes perfect sense. This blog post is my attempt to convey this intuition: the TL;DR is that plants can’t run away from their problems, so they benefit more from a larger genetic arsenal of solutions to throw at adverse conditions.
Let’s start with an example.
Consider two organisms. The first, who we will call Moe, is a simple motile water-dwelling organism with a whip-like tail and some basic means of sensing the world around it (such as photosensitive compounds). It is able to react to its environment to move away from dangers to its continued existence, meaning its behaviour can be very roughly approximated as the following.
while(alive):
for sense, sense_data in senses:
if sense_data not in safe_region(sense):
= gradient(sense)
direction move(direction)
As you can see, the above circuit is very simple but has the potential to be highly effective, provided there exists a gradient that can move the organism out of danger. The same basic behaviour (move in the direction of the gradient) can be applied to a huge variety of situations, allowing for a relatively short program to produce effective behaviour in many sensory domains. indeed, it’s easy to see how, should the organism acquire some new sense, it could adapt its existing orient-along-gradient-and-whip-tail-quickly behaviour to the new data.
Now let’s consider a second organism, which still has to adapt to the same environmental challenges as the first but has the unfortunate constraint of being a non-motile life form. For simplicity, as I will refer to it as “Herb”.
while(alive):
for sense, sense_data in senses:
if sense_data not in safe_region(sense):
if sense == 'temperature':
match sense_data:
case sense_data < 0:
release_antifreeze()case sense_data < 5:
extract_water_from_leaves()
extract_chlorophyl_from_leaves()
drop_leaves()case sense_data > 35:
activate_heat_shock_proteins()if sense == 'humidity':
match sense_data:
case sense_data < 10:
produce_abscisic_acid()case sense_data > 90:
produce_phytoalexins()
reinforce_cell_wall()
generate_ROSs()# and so on
Herb is playing life on hard mode. When the going gets tough, Mo gets going (ideally very far away), but Herb has to stay put and deal with it. In a dry spell, Herb has to reduce its water-losing activities. In a cold snap, Herb has to rapidly remove water from sensitive parts of its organism and produce chemical compounds that reduce the damage caused by water crystallization. In these situations, it’s not sufficient for Herb’s genome to encode a single behaviour (run away). Herb’s genome has to encode a custom solution for each of the many threats to existence and reproduction that might be encountered in this environment. Whereas Moe only needs to encode movement in a direction of a detected gradient, Herb has to synthesize several complex molecules (each of which requires a different set of proteins and thus genes to achieve), develop complex cellular signalling pathways, and modify its metabolism in response to adversity. If genetic drift removes any of these components in Herb’s children, natural selection will be quick to punish the overly parsimonious descendants; Moe’s descendants, by contrast, will happily shed their genetic baggage that allowed them to survive freezing temperatures and extended droughts.
This is a cute theory, but it’s wrong. Or at least, incomplete.
It isn’t the case that all plant genomes are bigger than all animal genomes, so clearly there’s more to genome length than just one’s ability to move around in the environment. For one thing, behaviours that allow an organism to escape danger might be very expensive to encode in the genome. Further, plenty of plants, like desert cacti, live in relatively placid environments, and many animals endure several dramatic environmental changes in their lifetimes\(^1\).
Mobility also has its limitations: you can’t run away from the cold in the arctic, or drought in the desert. So even the most mobile animals require some innate adaptations to the local environments. The mapping between number of base pairs required to encode a particular phenotype and the external complexity of that phenotype is also Add on top of that the fact that genome length doesn’t seem to be particularly strongly regularized by evolution, and what you end up with is a huge variety of genome sizes within and across kingdoms.
There’s also the misalignment between outward complexity of a phenotype and the number of base pairs required to encode it. The longest gene in the human genome, clocking in at 2.4 million base pairs, encodes a protein called dystrophin, which as you might have guessed from the name can cause muscular dystrophy if it undergoes certain particularly deleterious mutations. FOXP2, one of the most critical genes responsible for language, by contrast, consists of a measly 607,446 base pairs. So language is apparently 1/4 as complicated as muscle contraction in DNA-base-pair language, which I wouldn’t have necessarily guessed a priori.
Once you’ve accepted Moravec’s paradox into your heart, this isn’t so shocking. At least biochemistry is a sufficiently alien field (I at least certainly don’t have an intuitive sense as to whether gluconeogenesis should be more biochemically difficult than whatever signalling pathway tells your sweat glands to start producing sweat) that human intuitions on “task difficulty” are relatively weak. So the paradox is more of a lack of correlation in this case than a negative correlation.
While gene length might not be particularly correlated with how “advanced” the behaviour it encodes is, there is at least an interesting correlation in humans where something like one-third of encoding genes are expressed in the brain, which is higher than any other organ. So it does seem like human brains have acquired a lot of different functions that require different genes to modulate.
The difference between animal-like modulation of one’s external environment (by running away from danger or towards a necessary resource) and plant-like modulation of one’s internal biochemistry reflects a bit the shift from GOFAI, where many carefully-constructed experts would be ensembled with a relatively simple switching circuit, to neural networks, where good initial conditions and learning dynamics are found, and then the system is set free to adapt to data as it comes in. Constructing an expert system that does as well as a large language model on many domains would take many many more lines of code than it takes to describe the initial weights and optimizer of a neural network.
Genome lengths don’t correlate much with human-perceived “advancedness” of an organism, but there’s some correlation with how much the organism needs to adapt on a biochemical level to its environment. So your typical plant benefits more from a long genome with lots of duplication than your typical animal, because the plant can’t run away from problems like drought or hot temperatures and needs a larger genetic arsenal to survive adversity.
Neurological complexity doesn’t correspond much to genomic complexity. Humans express a lot of genes in the brain, but genes that control brain development aren’t wildly larger or more complex than other human genes, like those that encode muscle proteins such as dystrophin.
One helpful analogy for the AI-inclined is the distinction between a hard-coded dialogue which has to explicitly script billions of possible behaviours, vs just initializing and running a language model, which requires many many fewer lines of code.
1. A prime example is the lungfish, which as its name suggests deals with droughts that evaporate the local water supply by growing a lung (during adolescence, not on-demand) and breathing air when there’s no water.