(This article was first published on

I've always wanted to write a(n overly) simple model of evolution. The assumptions are minimalistic: only one species, for which each individual's genotype is represented as a one-dimensional real number, e.g. 7.4. Now, the fun stuff: I define a function mapping genotype to probability of reproduction, like this:**mickeymousemodels**, and kindly contributed to R-bloggers)To keep things simple, reproduction is asexual; and, to further simplify, the only possibilities are: (1) survive to reproductive age and have exactly 2 offspring; or (2) pass on without having had any children. The survival function outputs the probability of (1), as a function of the individual's genotype. If Pr[Reproduce|Genotype] > 0.50, you're doing well.

The next step is generating the two offsprings' genotypes, given that the parent has reproduced. In the spirit of heredity, I figure that the offspring should be genetically similar to their parent, and I model this by taking the parent's genotype (e.g. 7.4) and adding a standard normal random variable. This is done twice (and independently), since there are two children (and neither affects the other).

The model takes place in discreet time: an arbitrary initial population exists at time zero; for each individual, I use the survival function to randomly determine reproductive outcomes (and generate two children when appropriate); the population of parents dies off, the children replace them, and we have reached timeperiod one. The model continues in this fashion for as long as desired. Depending on the survival function, and on randomness, the population may die out, or drift forever upward.

Here's a first draft of my tiny model of evolution, simulated in R:

`ProbOfReproduction <- function(x) {`

z <- -0.05*x^4 + 0.1*x^3 + 0.5*x^2 - 0.25*x

return(1 / (1 + exp(-z)))

}

dev.new(width=8, height=6)

plot(ProbOfReproduction, -6, 6, xlab="genotype", ylab="probability of reproduction", main="Probability of Reproduction Given Genotype", lwd=2, col="darkgreen")

mtext("Looks like the hat / boa in Le Petit Prince")

savePlot("prob_of_reproduction_given_genotype.png")

GetNextGeneration <- function(x) {

# Takes genotypes of current generation & returns children

reproduce <- runif(length(x)) < ProbOfReproduction(x)

parents <- x[reproduce]

children <- rep(parents, 2) + rnorm(2 * length(parents))

return(children)

}

RunSimulation <- function(num.generations = 100) {

initial.population.size <- 1000

genotypes <- rnorm(initial.population.size, -5, 1)

# One plot above the other

par(mfrow=c(2,1))

for(i in 1:num.generations) {

plot(ProbOfReproduction, -10, 10, xlab="genotype", ylab="probability of reproduction", main="Probability of Reproduction Given Genotype", lwd=2, col="darkgreen")

hist(genotypes, breaks=c(-20:20)/2, xlab="genotype", main="Histogram of Genotypes")

genotypes <- GetNextGeneration(genotypes)

# Break from the loop if species is extinct (or very large)

if(length(genotypes) == 0 |

length(genotypes) > 100 * initial.population.size) break

}

}

library(animation)

saveVideo(RunSimulation(), video.name="tiny_model_of_evolution.mp4", interval=1.0, outdir=getwd())

What exactly happened in those few dozen frames? Time for some commentary: initially, I create a population of 1,000 individuals, whose genotypes are normally distributed with mean -5 and variance 1. That's the first frame. Most of these individuals are unfit, in the sense of having a very low probability of reproducing -- indeed, most of them end up dying without leaving any children. But a handful of people at the rightmost end of the distribution do reproduce, creating a second generation whose distribution of genotypes is already noticeably different from the first.

By the second generation, a few fortunate individuals are sitting comfortably beneath the first bump in the reproduction function, where Pr[Reproduce | Genotype] > 0.50. That segment of the population is expected to grow, since they will each, on average, have more than one child. At this point, we're doing well.

Eventually, genetic variance yields some very fit individuals. We end up with a segment of the population beneath the large bump in the reproduction function, where the probability of reproduction is close to one. The population is now growing extremely rapidly, and the simulation ends (before my computer explodes).

That's all there is to it: the population has evolved; although it shrank initially, it is now much more fit, and growing rapidly. The distribution of genotypes has migrated rightward, until it found the maximal point on the reproduction function -- not bad, given that it only took a few dozen generations. This seems much too easy!

Of course, the model is much too simplistic. The reproduction function is rather generous -- and, perhaps more importantly, it does not change. Evolving against a static environment seems pretty straightforward, and also quite unrealistic. Beyond that, there are a bunch of other issues. Why are there no other species? Where does the reproduction function come from? Why don't the individuals affect their environment? What explains the variance in offsprings' genotypes, relative to their parent?

I'll end with a fun thought experiment: consider all the parameters of the model -- the reproduction function, the initial genotype distribution, and so on -- as fixed, and think of what would happen if the variance in offsprings' genotypes were made larger or smaller. What would change? If the variance were zero, the species would die out very quickly, since they'd be stuck with their initial unfit genotypes. There'd be an opposite problem if the variance were too large, as this would prevent parents from passing on their good genes. There must be some sweet spot in the middle.

To

**leave a comment**for the author, please follow the link and comment on his blog:**mickeymousemodels**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...