(This article was first published on

**distributed ecology**, and kindly contributed to R-bloggers)Le Grand Casino of Monte Carlo |

**A simple integration example**

Let’s start with a trivial example, integrating a function we use all the time as ecologists, the normal distribution. Maybe you want to integrate the normal probability density function (PDF) from -1 to 1, because you’re curious about how likely an event within 1 standard deviation is. To get the area under the curve we simply integrate the PDF from -1 to 1.

MC integration of the normal PDF between -1 and 1 |

*x,y*grid and sample randomly from it and ask: “Is this random point under my curve or not. To approximate the integral we multiply the fraction of our samples under the curve (

*f*) by the total area we sampled,

_{c}*A*. Using R we can do both the actual integral and the Monte Carlo version easily. The actual answer is 0.682, and the approximate answer I got was 0.688, so pretty close. You can see the full code at this gist.

**A simple statistical example**

Another place we can use these methods in statistical hypothesis testing. The simplest case is as an alternative to a t-test. Imagine you have a data set with measurements of plant height for the same species in shaded and unshaded conditions. Your data might look like this:

Shaded | 13.3 |

Shaded | 12.1 |

Shaded | 14.7 |

Shaded | 12.8 |

Unshaded | 17.8 |

Unshaded | 19.4 |

Unshaded | 18.5 |

Unshaded | 18.5 |

**“All that code for a t-test?”**

That must be what you’re thinking, and you’re right, its certainly unwieldy to write all that code for something so simple. But its a good starting point for when we begin talking about null models. You may not have realized it but in the previous example there’s two assumptions behind our inference. The first is that some process or mechanism has caused a difference between our groups. The fact that plants grow to different heights in shaded and unshaded conditions says something about the way plants use light, or the way they compete for light, or maybe some other mechanism I haven’t thought about. The second is that by randomizing our existing data, we can simulate a situation where we have collected the data under completely stochastic conditions, e.g. the process causing plant height is random. So there are our two assumptions: A.) Our data set represents the outcome of some process and B). by randomizing we can create a null model without process to test our own data against. Here’s where things become a bit more gray. In our example above the the null hypothesis is pretty clear, and we can all agree on it, but problems arise with more complicated questions. Traditionally null models have been used to make inferences about community assembly rules. The use of these models was prevalent during the “battles” constituting what is tongue-in-cheek called the “null model wars”. I won’t take up any space rehashing the null model wars of the 70’s and 80’s but links to good resources can be found here on Jeremy Fox’s Oikos blog and his post about refighting the null model wars. Suffice to say careful attention needs to be paid to the selection of the proper null model. Nick Gotelli has lots of good papers about null models if you peruse his work . I’ve worked up several examples from his 2000 and 2010 papers, and sometimes the algorithms can be challenging. I’ll cover more advanced methods in a future post going over some methods from Nick’s papers.

To

**leave a comment**for the author, please follow the link and comment on his blog:**distributed ecology**.R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...