# Mutation, selection, and drift (with Shiny)

May 14, 2017
By

(This article was first published on R – On unicorns and genes, and kindly contributed to R-bloggers)

Imagine a gene that comes in two variants, where one of them is deleterious to the carrier. This is not so hard to imagine, and it is often the case. Most mutations don’t matter at all. Of those that matter, most are damaging.

Next, imagine that the mutation happens over and over again with some mutation rate. This is also not so hard. After all, given enough time, every possible DNA sequence should occur, as if by monkeys and typewriters. In this case, since we’re talking about the deleterious mutation rate, we don’t even need exactly the same DNA sequence to occur; rather, what is important is how often a class of mutations with the same consequences happen.

Let’s illustrate this with a Shiny app! I made this little thing that draws graphs like this:

This is supposed to show the trajectory of a deleterious genetic variant, with sliders to decide the population size, mutation rate, selection, dominance, and starting frequency. The lines are ten replicate populations, followed for 200 generations. The red line is the estimated equilibrium frequency — where the population would end up if it was infinitely large and not subject to random chance.

The app runs here: https://mrtnj.shinyapps.io/mutation/
And the code is here: https://github.com/mrtnj/shiny_mutation

(Note: I don’t know how well this will work if every blog reader clicks on that link. Maybe it all crashes or the bandwidth runs out or whatnot. If so, you can always download the code and run in RStudio.)

We assume diploid genetics, random mating, and mutation only in one direction (broken genes never restore themselves). As in typical population genetics texts, we call the working variant ”A” and the working variant ”a”, and their frequencies p and q. The genotypes AA, Aa and aa will have frequencies $p^2$, $2 p q$ and $q^2$ before selection.

Damaging variants tend to be recessive, that is, they hurt only when you have two of them. Imagine an enzyme that makes some crucial biochemical product, that you need some but not a lot of. If you have one working copy of the enzyme, you may be perfectly fine, but if you are left without any working copy, you will have a deficit. We can describe this by a dominance coefficient called h. If the dominance coefficient is one, the variant is completely dominant, so that it damages you even if you only have one copy. If the dominance coefficient is zero, the variant is completely recessive, and having one copy of it does not affect you at all.

The average reproductive success (”fitness”) of each genotype is described in terms of selection coefficients, which tells us how much selection there is against a genotype. Selection coefficients range from 0, which means that you’re winning, to 1 which means that you’ve been completely out-competed. For a recessive damaging variant, the AA homozygotes and Aa heterozygotes are unaffected, but the aa homozygotes suffers selection coefficient s.

In the general case, fitness values for each genotype are 1 for AA, $1 - hs$ for Aa and $1 - s$ for aa. We can think of this as the probability of contributing to the next generation.

What about the red line in the graphs? If natural selection keeps removing a mutation from the gene pool, and mutation keeps adding it back in again there may be some equilibrium frequency where they cancel out, and the frequency of the damaging variant is more or less constant. This is called mutation–selection balance.

Haldane (1937) came up with an expression for the equilibrium variant frequency:

$q_{eq} = \frac {h s + \mu - \sqrt{ (hs - \mu)^2 + 4 s \mu } } {2 h s - 2 s}$

I’ve changed his notation a bit to use h and s for dominance and selection coefficient. $\mu$ is the mutation rate. It’s not easy to see what is going on here, but we can draw it in the graph, and see that it’s usually very small. In these small populations, where drift is a major player, the variants are often completely lost, or drift to higher frequency by chance.

(I don’t know if I can recommend learning by playing with an app, but I definitely learned things while making it. For instance that C++11 won’t work on shinyapps.io unless you send the compiler a flag, and that it’s important to remember that both variants in a diploid organism can mutate. So I guess what I’m saying is: don’t use my app, but make your own. Or something.)

Literature

Haldane, J. B. S. ”The effect of variation of fitness.” The American Naturalist 71.735 (1937): 337-349.

Postat i:genetik Tagged: mutation, R, selection, Shiny

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...