Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.  Le Grand Casino of Monte Carlo
On Monday I’m going to be leading a little stats workshop on randomization tests and null models. In preparation for this I wrote up code for null model examples I wanted to write a post that introduced the basics of these models (Null models, bootstrapping, jack-knifing etc…) that are all specific classes of a general method known as Monte Carlo methods. Put simply, a Monte Carlo method is any approach that generates random numbers and then seeing how different fractions of them behave. Its a powerful method that can be used for a wide variety of situations, and its commonly used for solving complex integrals among other things.

A simple integration example

Let’s start with a trivial example, integrating a function we use all the time as ecologists, the normal distribution. Maybe you want to integrate the normal probability density function (PDF) from -1 to 1, because you’re curious about how likely an event within 1 standard deviation is. To get the area under the curve we simply integrate the PDF from -1 to 1.  MC integration of the normalPDF between -1 and 1
In this case its silly because we already have an analytical solution, but it can be necessary for more complicated integrals. The simplest method is “hit or miss” integration where we create an x,y grid and sample randomly from it and ask: “Is this random point under my curve or not. To approximate the integral we multiply the fraction of our samples under the curve (fc) by the total area we sampled, A. Using R we can do both the actual integral and the Monte Carlo version easily. The actual answer is 0.682, and the approximate answer I got was 0.688, so pretty close. You can see the full code at this gist.

A simple statistical example
Another place we can use these methods in statistical hypothesis testing. The simplest case is as an alternative to a t-test. Imagine you have a data set with measurements of plant height for the same species in shaded and unshaded conditions. Your data might look like this:
An easy way to test this is with a parametric t-test. The Monte Carlo approach involves a several basic steps that are the architecture of any randomization test no matter how complicated. First calculate the true value of your hypothesis. In this case we are interested in if there is a difference between these two groups. We can calculate the mean of the shaded and unshaded group and subtract them. We want to know if this value is significantly different from 0. To determine this we need to construct a null distribution and here is where the Monte Carlo method comes in. If there is no difference between the groups, then which group a given height is associated with shouldn’t matter. So we reshuffle the values of the labels from the existing data set and each time we calculate our test statistic, the difference in the means and store it. This will create a distribution of test statistic values. We then compare the true value to the null distribution. Usually the test being performed is two-sided, so we check against the 95% confidence interval of the null distribution. If the value is within that interval then we will fail to reject the null hypothesis that there is a difference in the two means. Here’s a gist with all the relevant code and some pretty figures in ggplot2.

“All that code for a t-test?”