Thursday, 19 May 2016

Has Evolution Been Proven?

What does it mean to 'prove' something? What is proof? What would it mean to 'prove' evolution?

Often, one comes across the claim that evolution has never been proven. Let's first be clear about what this means. 

Proof is a formal procedure applicable to axiomatically grounded systems of deductive logic. Here's an example:

Axiom: The addition of two integers gives the sum.
Proof: 1+1=2

Seems pretty straightforward, no? Would that it were that simple.

The most ready application of this process is in mathematics, of course, which is built from the simplest axiomatic foundations in which all the core axioms are definitionally true, because of the way the requisite entities are defined. 

Is evolution a deductive system of logic? Is science?

Science, being primarily an inductive discipline, doesn't generally use proof except in very strict, very specific circumstances, one of which is at the heart of scientific reasoning. I've previously posted explicitly on this topic, so I won't explore it further here.

What has been done in the case of evolution is something that, while not commensurate with this procedure, is something every bit as solid and unassailable: It has been observed occurring!

Yes, you read that correctly. We've actually watched it happening. Not merely adaptation - which is still evolution, incidentally - but full-on evolution, and macroevolution at that.

This post will deal with some specific instances documented in the literature, including one beautiful and, to those unaware of it, quite surprising instance, but I won't spoil it yet.

Some unpacking of terms is warranted here, because they get thrown around an awful lot, mostly by those who don't understand how the terms are used in the primary literature, but to such a degree that they even confuse the scientifically literate to the point that the stock responses don't really deal with the underlying issues.

Let's start with a common term that comes up in discussion with apologists for creationism, and that doesn't appear in the literature at all: Kind.

This is a quite beautifully distracting elision, erected specifically to bring the whole argument back to the bible. It's rooted in Genesis 1:25.

And God made the beast of the earth after his kind, and cattle after their kind, and every thing that creepeth upon the earth after his kind: and God saw that it was good.

Of course, the one thing they can never do is to define just what is meant by a 'kind'. It should be a simple matter, amounting to no more than pointing at where it comes in the phylogenetic hierarchy employed in the relevant fields, an example of which is on the left. 

Some things worth noting about this tree as I've drawn it. First, I've only included one branching per generation of the hierarchy, while in fact there may be many. 

Also, and quite importantly, in the 'domain' section of the hierarchy, it looks a lot like Eukaryota and Archaea are the parents of the illustrated kingdoms. This is not correct (as nearb as we can currently tell). In fact, the kingdoms shown descend only from the Eukaryota, which branched earlier from the Archaea. 

It's again apposite to note that each of those boxes constitutes a population of organisms, whatever the level because, and I can't stress this enough, evolution is a population phenomenon.

Now, it's worth pointing out that this particular system is being used less and less, and is being replaced with the more robust cladistic system. A clade is any species and all its descendant species. The reason for this should be reasonably clear, and it's all to do with how those branchings (known in the jargon as 'divergence') actually work. For example, each of those levels in the hierarchy was once a species, which means that the entire phylogenetic system is a bit of a moving feast. Further, if you look at the top of the tree, where Homo sapiens resides, there's an empty space next to it, illustrating a possible future divergence. If and when that happens, there might be a bit of a problem. When dealing with other species we discover around the animal kingdom, we don't really have issues classifying them but, if our species were to diverge, which branch of the divergence would be entitled to the appellation Homo sapiens?

Anyhoo, the main point here is that 'kind' has no place in this system. What about cladistics? Does this offer any solace for 'kinds'? 

Not remotely. Cladistics actually makes the problem worse for 'kinds', because at its heart is a macroevolutionary process, and that's entirely the problem for creationists, because they deny that this even occurs.

One of the problems I've encountered an awful lot is the literate, in dealing with some of the claims propounded by deniers, is misunderstanding of just what macroevolution is. I've come across responses such as 'macroevolution is just lots of microevolution', and 'macroevolution isn't even a valid term, there's just evolution'. Neither of these is correct. Before we can reasonably deal with why that is, there's another problematic term that we need to deal with: Species.

What exactly is a species? This is a question that's troubled evolutionary biologists for a long time, and only in recent years has some resolution been found. It isn't that defining a species is problematic in and of itself, it's rather that most ways of defining a species raise problems if applied too rigidly. The most commonly-accepted definition currently in use is known as the 'biological species concept' (BSC). This defines a species as a population of organisms throughout which gene flow occurs at a given moment in time.

Now, even this, if applied too rigidly, can be problematic. For example, viral vectors constitute gene flow, and there's no good reason to infer that, therefore, the Ebola virus is human. That would be absurd. Properly applied, though, the BSC is extremely robust, and tends to be problem-free. 

Gene flow is fairly straightforward, though it does have some implications, which we'll come to shortly. You'll note, however, the last part of that definition 'at a given moment in time'. This is one of the most important features of the BSC. Indeed, any conception of species that doesn't contain a temporal component of this nature is rapidly going to run into trouble. This is because, absent such a feature, any conception is going to come hard against what I like to call 'the discreteness problem'. Richard Dawkins termed it 'the tyranny of the discontinuous mind'. It deals with our need to classify things and put them in little boxes, overlooking the fact that nature rarely works in such a digital manner. Having this component allows us to classify away to our heart's content without running into this issue. What this allows us to do is to properly treat evolution as what it is, namely a population phenomenon. Evolution only applies to individual members of species in terms of their contributions to the gene pool. Any species must be defined at a given moment.

To clarify that, think about a group of humans. For simplicity, we'll work with the premise that the population size remains constant, so that each pairing gives rise to a pair, and the death rate keeps pace with the birth rate.

Now, each pair of humans will give birth to progeny that are the same species as their parents. This must be the case. There are differences in their genetic make-up, not least because each of the offspring is a different blend of the genes from both their parents, but they are still the same species, because gene flow is occurring between them. Now, here's one of those places where the BSC can lead to absurdity if applied to rigidly because, once past reproductive age, there is no gene flow, so anybody over a certain age will not be classified as the same species. That's an obvious problem, though, and can reasonably be discounted.

Anyway, our population is cycling, giving birth and dying, and we can let this go on for as many generations as necessary. Now let's imagine that, after lots of generations, one of our population invents a time machine. It goes right back to where we started, say 30,000 generations before. Is it the same species as the original population it met? We've been calling them humans the entire time the experiment has been running, and indeed we could even use the proper binomial for them, Homo sapiens (binomial means 'two names', and is the technical term for any species name, which should always be italicised, with the genus capitalised and the specific name lowercase) . So, they're both H. sapiens, but are they the same species? In all likelihood, they aren't. 30,000 generations is certainly sufficient time for enough genetic differences to have built up to make them biologically incompatible, so they're different species, but they're described by exactly the same binomial. That's why we need the temporal component in our species concept.

So, now we've laid the groundwork, what about those two terms, micro- and macroevolution?

The macro/micro distinction is a valid distinction in evolutionary biology, but it doesn't mean what the creationists think it means.

Evolution is defined as variation in the frequencies of alleles, where an allele is a specific iteration of a given gene. An easy way to understand what an allele is is to think about insulin.

Insulin is a critical molecule for almost every vertebrate. Deficiency in insulin production is very common, the disorder being known as diabetes mellitus.

The gene coding for insulin has been extensively studied for a fairly wide range of organisms. We can look at the sequences of two closely-related organisms to show what an allele is. Here's the gene coding for insulin in humans, which can be found on chromosome 11:

atg gcc ctg tgg atg cgc ctc ctg ccc ctg ctg gcg ctg ctg gcc ctc tgg gga cct gac
cca gcc gca gcc ttt gtg aac caa cac ctg tgc ggc tca cac ctg gtg gaa gct ctc tac
cta gtg tgc ggg gaa cga ggc ttc ttc tac aca ccc aag acc cgc cgg gag gca gag gac
ctg cag gtg ggg cag gtg gag ctg ggc ggg ggc cct ggt gca ggc agc ctg cag ccc ttg
gcc ctg gag ggg tcc ctg cag aag cgt ggc att gtg gaa caa tgc tgt acc agc atc tgc
tcc ctc tac cag ctg gag aac tac tgc aac tag

Which codes for the following insulin precursor:


Here's the same gene coding for insulin in lowland gorillas, with the differences highlighted in red:

atg gcc ctg tgg atg cgc ctc ctg ccc ctg ctg gcg ctg ctg gcc ctc tgg gga cct gac
cca gcc gcg gcc ttt gtg aac caa cac ctg tgc ggc tcc cac ctg gtg gaa gct ctc tac
cta gtg tgc ggg gaa cga ggc ttc ttc tac aca ccc aag acc cgc cgg gag gca gag gac
ctg cag gtg ggg cag gtg gag ctg ggc ggg ggc cct ggt gca ggc agc ctg cag ccc ttg
gcc ctg gag ggg tcc ctg cag aag cgt ggc atc gtg gaa cag tgc tgt acc agc atc tgc
tcc ctc tac cag ctg gag aac tac tgc aac tag

And here's the insulin precursor:


As you can readily see, even though the gene has differences, the precursor is identical. There's an important point there about how genes work, which I'll come back to shortly. Meanwhile, if you want to see more on this, please see the excellent post on the fallacy of one true sequence by Calilasseia, which deals with a great deal more than just these differences.

The main point here is that these different genes are known as alleles, because they are different versions of the very same gene. That said, there are many, many instances in which different organisms carry exactly the same allele. For example, if you compare the gene coding for histone in humans and chimpanzees, you'll find that they're identical.

Microevolution is defined as variations in the frequencies of alleles below species level, in a population of organisms. Macroevolution is defined as variations in the frequencies of alleles at or above species level, or in populations of species. 

In short, macro isn't merely lots of micro, because macroevolution goes on with every variation of alleles that are shared. When a chimpanzee gives birth and its offspring is carrying an allele that is shared with humans, that's macroevolution.

There are other examples of macroevolutionary processes at work that are not necessarily well-understood, such as extinction, in which the frequency of all alleles in a species go from 'some' to 'none'. Because this variation is happening at species level, it also constitutes macroevolution. Another is fixation, a process in which, through genetic drift, a given allele gets distributed in such a way that every extant member of a species is carrying it. This is again variation in frequencies at species level.

So, we talked above about how a species is defined, and we talked about divergence of populations. This is known as 'speciation'. In a nutshell, this is when a single population gets fractured reproductively, generally due to some geographic barrier, so that they are accumulating different sets of alleles. While they're a single population, the constant mixing of these allele sets keeps them reproductively compatible. Once separated, though, they move closer and closer to having sufficiently different sets of alleles that their DNA is no longer inter-compatible, and they diverge as separate species. This will generally take lots of generations because of the rate at which new mutations accumulate which for humans, is about 350 per new birth, a tiny number in terms of the genome. 

Looked at from just the right perspective, all of the above can be said to be underpinned by one thing: Extinction. Using an example of divergence as described above, this is how it works. Look at the cladogram on the right.

If you think of the boxed area as 'a moment in time', we can think of all the organisms in the box as being extant simultaneously. The parent (below the divergence) is still capable of reproducing with the offspring on both sides of the divide (although the resulting offspring may have a suspicious talent for playing the banjo) thus, even though the offspring themselves may not be capable of producing fertile offspring, gene flow can still occur. Of course, this is a vast oversimplification for the purpose of illustration. Such an occurrence is unlikely in reality, as such incompatibility doesn't generally arise in so few generations, but it isn't impossible.

Now let's look at a later snapshot.

Here, we can see the same cladogram but highlighting a later moment in time. The parent organism is now 'extinct', and no gene flow can occur between the remaining organisms, which means that they are different species. While there were still members of the parent species carrying a set of alleles that are compatible reproductively with members of both daughter species, they were a single species. Once the last member of that population dies or passes reproductive age, the divergence is complete. Thus speciation ALWAYS occurs with extinction of the parent alleles at a given moment in time, because it occurs at the moment at which the last member of the population through which gene flow can occur is removed from the population.

It's also really important to remember that the dots in these diagrams don't represent individual organisms, but populations. 

Another good way to visualise how extinction and speciation go hand-in-hand is to think about an extinction event in a ring species.

A ring species is a population of organisms that has spread out in a ring. A notable ring population is the ensatina salamander population of California (there is some controversy over whether the ensatinas are actually a ring species, but that's not massively important for our purposes). This is a population of several subspecies of salamander distributed around California. The diagram shows how they're distributed. 

The population starts at 1, and works it's way around the ring clockwise. Subspecies 1 isn't reproductively compatible with subspecies 7, at top left, but it is with 2, and 2 with 3, etc. There's probably also some overlap, so one could be compatible with 3, for example. Again, for the purpose of this illustration, whether that's actually true of the ensatinas isn't really relevant, as we're just talking about the principles. 

Now let's look at what happens if there is a disaster, such as a bolide impact somewhere around the ring. 

We can see now that, because subspecies 3 and 4 have gone extinct, there is no gene flow between 2 and 5. Because the species at the ends of the ring are not reproductively compatible, gene flow between the Eastern and Western populations has now ceased, meaning that they are now separate species. This is extinction directly driving speciation.

All of the above is what those in the relevant fields call 'macroevolution', and they're all observed.

Of course, what the creationists are looking for when they use the 'kinds' nonsense is something that not only isn't predicted by evolutionary theory, but would actually falsify evolutionary theory at a stroke. What they're looking for is something along the lines of a fish giving birth to a vole, or some such. I hope I've already given enough information here to show what an absurdity that is.

Before we move on, we also need to take a look at one more term, a term that has caused much of the objection to evolutionary theory from the get-go. That term is 'random'.

It's a little-known fact that the earliest objections to Darwin's work didn't come from the religious, it came from a different quarter entirely, and a surprising one at that: Physicists.

The big problem was that, since Newton, it was thought that the universe was just like a big clockwork machine. Pierre-Simon Laplace famously stated that, with Newton's work, all we needed to know was the position and velocity of every particle in the universe and we could predict with perfect accuracy any past or future state.

This is now known as Laplacian determinism. All the physicists of Darwin's day were Laplacian determinists. They couldn't countenance the idea that there were random elements at play in the universe. But what do we actually mean by random here? 

Random, in the way that I employ it, and in the way that it arises in evolutionary theory, means 'statistically independent'. It does not mean, as some suppose 'uncaused'. It simply means that, of a range of possible outcomes, any one outcome is statistically as probable as any other outcome.

To give a concrete example of something random, we can look at the decay of a single atom. The moment of decay of, say, an atom of caesium, is entirely random. It can happen any time from the moment the atom first arises to the heat-death of the universe. There's absolutely no way to predict when it will decay. Each of those moments, and thus the time of decay, is statistically independent.

So, Darwin had introduced the random and the probabilistic, and the physicists weren't too happy about it. Ludwig Boltzmann, father of thermodynamics, cites Darwin as one of his major influences in the formulation of statistical mechanics, and describes him, having laid the groundwork that would ultimately result in quantum mechanics, as the greatest physicist of the 19th century.

Anyway, a bit of a digression but, I think, and interesting one. 

So, there are random elements in evolution, both in the mutations that drive diversity, and the selection that attenuates it. But that can't be right, can it? Richard Dawkins says that natural selection is the opposite of random! And he's right, of course, but there's a danger of equivocation, so let's look at what he means, and what I mean, and see if there really is a contradiction.

Natural selection has to be looked at in two ways to be fully appreciated. The first is from the perspective of the population, at which level the effects of selection are seen. At this level, NS is most definitely not random, because it can be probabilistically quantified. At this level, we see that, on average, advantageous alleles are selected for, in the form of being passed on to future generations with a statistical weighting. We also see that, on average, deleterious alleles are selected against, in the form of not being passed on to future generations, again with a statistical weighting. This is what Professor Dawkins is talking about. It can't be said enough times that evolution is a population phenomenon, and statistical in nature.

The second way to look at NS is from the perspective of the individual organism, at which level selection actually operates. From this perspective, NS is random. The particular selection pressure that an individual organism will succumb to or indeed evade, is statistically independent, thus random. The organism with an allele that allows it to evade a particular selection pressure has statistical significance, but the means of checking out without issue are many and diverse, and which particular pressure said individual will fall prey to (pardon the pun) can only be treated in the broadest of terms. An organism can be the strongest, fastest, best-equipped predator, an alpha in every sense of the word but, like the ensatinas above, if he gets hit by a big flaming rock from outer space, he's fucked.

Properly, evolution is neither random nor non-random. It's stochastic, which means simply that future states of the system are contingent upon initial conditions plus one or more random variables.

A simple example of a stochastic system is ten coins. Put them down on a table. Some will be heads, some tails. These are your initial conditions. Pick one of the coins, execute a coin toss and put it back in place on the table with the winning side showing. These are your new initial conditions. repeat this exercise as many times as you like. At each stage, the future evolution of the system is neither entirely random nor entirely non-random. It depends on the initial conditions, which are reset after each toss, and one or more random variables, in this case, the particular coin you choose and the outcome of the toss.

Now let's add a little complication. Coins can land on their edges. There's also a robot that doesn't like edges, so it knocks them flat. It's not a very good robot though so, sometimes, it will miss. Moreover, sometimes when it misses it will actually knock a flat coin up on its edge. 

We now have a system in which advantage plays a part, but where the efficacy of that advantage is itself statistical. The system will keep evolving, and the robot will tend to keep the numbers of the 'unfit' coins down, so that there is a statistical bias toward coins that are lying down, but the occasional edge case will still make it through the filtering process.

And of course that's what selection is; a filter. Like most filters, it isn't perfect. Some organisms carrying advantageous alleles for the environment will be filtered out by being subjected to a bolide impact, or a random snake-bite, or spider-bite, or lose footing at the top of a cliff, or some such. Thus, at the level of the individual, selection is random. At the level of the population, there's a heavily -weighted bias toward advantageous alleles.

Even with entirely random events, we can extract statistical information. The science of radiometric dating is based on this. The decay of an individual atom is entirely random, but get a large enough collection of them and we can put numbers on how quickly a sample will decay. We call it the half-life, and it describes the amount of time that half of a given element will decay. I won't dwell further on isotopic decay here, because I have plans for that topic on the table.

Before finally moving on, a quick word about fitness.

In the popular view of evolution via natural selection, the common catch-phrase is 'survival of the fittest'. It's important to know just what's meant by fitness here, not least because a straight vernacular treatment of it would tend to suggest that the strongest survive, when that's simply not the case. That phrase only has meaning when the full technical definition of 'fitness' is employed, in which it's a measure of performance against an expected average with regard to number of offspring. In short, what defines fitness is reproductive success. As alluded to above, even the strongest, fastest organism can fail to be represented in future generations. Indeed, what actually defines a trait as advantageous or deleterious is not a function of the trait itself, but a function of the environment the trait finds itself in. 

A clear, if less than wonderful, example of this is the sickle gene. One would expect, on a narrow, cursory assessment of this gene, that it should be selected against, because if you carry two copies, the likelihood is high that you'll suffer from sickle-cell anaemia, probably from about your early to mid twenties. In regions where malaria is rife,  however, carrying a single copy increases resistance to malaria, so it isn't selected against, it's selected for. In fact, since sickle-cell anaemia usually doesn't manifest until well into productive years, it isn't strongly selected against even in places where malaria is not prevalent.

So, now we've deal with what evolution is, and what it isn't, more importantly, let's look at a couple of cases. The literature is replete with observations of speciation, allele variation, fixation and extinction. I'd wanted to aim for brevity in this post, and I'm keenly aware of how spectacularly I've failed (and that's even having left out a fair bit for now), so I'm going to restrict myself to two examples. Since I want to expend some real estate on the latter, I'm going to be inversely verbose on the former. I'll provide links at the bottom to some useful resources on observations of evolution in action.

The first is a long-running experiment using bacteria, Escherichia coli, at Michigan State University, overseen by Richard Lenski. Starting with just 12 populations of the same strain of E. coli, the experiment has been running since 1988, and passed the 60,000 generation mark some two years ago. This experiment has seen several speciation events, along with the evolution of new traits, such as the ability to transport citrates in an aerobic environment. While these bacteria could already process citrate, they were unable to use it as an energy source in aerobic environments. The results of this research have been pretty spectacular, and the experiment is ongoing.

The details can be found in their inaugural paper, Historical contingency and the evolution of a key innovation in an experimental population of Escherichia coli. I'll include a link to the paper at the bottom.

The second example is that nice surprise I was hanging onto from earlier, and it involves a speciation event in butterflies.

In a study funded by the Smithsonian Tropical Research Institute and others, Jesús Mavárez et al spent some time studying several species in the genus Heliconius. They noted that Heliconius heurippa was intermediate in wing pattern between two other species, H. melpomene and H. cydno, and hypothesised that this was a result of a hybridisation. Genetic analysis lent weight to this when it was discovered that certain alleles carried by H. heurippa were a mixture of alleles from H. melpemone and H. cydno.

But it gets even more interesting!

The experimenters decided to do some cross-breeding in the lab between H. melpemone and H. cydno to produce 'F1' hybrids, and then back-crossed the fertile males to females of both species. When breeding with H. melpemone, the melpemone wing pattern returned. However, when they back-crossed with H. cydno to produce an 'F2' generation, and then selectively bred the F2 generation, the intermediate wing pattern emerged, almost identical to the wild heurippa.

Were they finished? Not likely.

They finally bred the lab-borne heurippa against the wild specimens, they produced fertile offspring with the heurippa wing pattern.

Similar speciation models are now being pressed into service to explain, among other things, diversity in the cichlid fish populations of lakes Victoria, Malawi and Tanganyika.

The above represents an absolutely minuscule amount of the evidence in support of evolution. The astute reader will have noticed that evolutionary biology isn't really my area of interest. My penchant is for physics, and I can say without fear of reprisal that most of the physicists I know would sell their grandmothers for the amount of evidence there is for evolution in their own fields. Evolutionary theory is about the best-supported theory in all of science.