Using simulation to demonstrate theory: Hardy-Weinberg Equilibrium

June 13, 2011

(This article was first published on bayesianbiologist » Rstats, and kindly contributed to R-bloggers)

One of my teaching roles is in an introductory Genetics course, where first year students are presented with a wide range of new ideas at a relatively fast pace.  It seems that often, students choose to take a memorization approach to learning the material, rather than taking the chance to think about how and why these genetic concepts actually work.  It is my conviction that, as teachers, it is our role to provide students with the opportunities to engage with the course material, and construct a solid understanding that will serve them as they proceed on to higher specialization.

When it comes to bang for my pedagogical buck, I have found that you really can’t beat the use of simulation as a platform for providing the opportunity for students to engage with theoretical concepts.  Here is an R script which I have written and used to allow students to explore how random mating in a population leads to the well known Hardy-Weinberg (HW) distribution.

For those who need a refresher, HW describes the genotype frequencies in randomly mating population. For the simple two allele case (A >> a), the frequencies are denoted by p and q; freq(A) = p; freq(a) = q; p + q = 1. If the population is in equilibrium, then freq(AA) = p2 for the AA homozygotes in the population, freq(aa) = q2 for the aa homozygotes, and freq(Aa) = 2pq for the heterozygotes.

What doesn’t usually get mentioned in introductory courses, is that the HW formula provides the expected frequencies of each genotype.  Of course, in real, finite populations, there will be variability around these values.  The seeming exactness of HW obscures the random processes at play.  To help students see how HW arises in finite populations (as opposed to the theoretical infinite populations required for the strict solution), I let them play with this simulation (R script).

Students can play around with the population size (N) and the number of generations (num_generations), to see how well the simulated populations correspond to the predicted HW.  Here is a plot of 200 simulated populations of size N=200, which are initiated out of the HW equilibrium and then randomly mated for one generation:

Feel free to try it out in your own class!


To leave a comment for the author, please follow the link and comment on his blog: bayesianbiologist » Rstats. offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , ,

Comments are closed.