A (fast!) null model of bipartite networks

[This article was first published on Timothée Poisot » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the challenges for ecologists working with trophic/interaction networks is to understand their organization. One of the possible approaches is to compare them across a random model, with more or less constraints, in order to estimate the departure from randomness. To this effect, null models have been developed.

The basic idea behind a null model is to generate a network with properties that are the same that the original network, but with a different distribution of the interactions. That allows to test wether your network could have been generated by chance, although Bascompte and colleagues showed that in almost all the cases, the distribution of some key measures are significantly different from their generated random distribution.

There are various model of increasing complexity, but one that is particularly interesting is the one described in the aforementioned Bascompte paper. The probability of having an interaction at the position Pl,c (where l are the lines and c the columns) is given by

P_{l,c}=\frac{P_lP_c}{2}

which as explained in the image below (behold my drawing abilities!) is the mean of the probability to have an interaction in the line and an interaction in the column (which, assuming the network is made entirely of 0 and 1, are the means of resp. the lines and the columns).

nullmodel_alpha.png

In R, it is quite easy to do it, using the colMeans and consorts functions. The brute force approach it to use two for loops, to go through the lines and columns. This is horribly slow, however, and the very interest of using null models is the ability to simulate a truckload of data. The function I wrote starts by finding the minimal size of the network, on which it will iterate (if there are less columns than rows, we will use a for loop on the columns). That way, we can construct the probability matrix, on which it is easier to use apply (and we already know the good margin!) to get the new network.

And it is way faster than the double-for-loop method, as showed in the below picture. The circles are the non optimized function, and the triangles are the new one. Just make sure to take a look at the axes: the new function is able to generate a network of size superior to 106 interactions (e.g. 1000 predators and 1000 preys) in roughly 0.25 seconds. This network size is highly unrealistic! Working with networks of roughly 400 interactions, I was able to generate 104 nulls in less than 6 seconds (on a unibody MacBook with 4Go RAM).

benchmark.png

The function is right here:

nullweb = function(ref)
{
  ref[ref>0] <- 1
  margin <- ifelse(ncol(ref)< nrow(ref),2,1)
  currentweb <- matrix(0,ncol=ncol(ref),nrow=nrow(ref))
  pc <- colMeans(ref)
  pr <- rowMeans(ref)
  if(margin==2)
  {
    for(i in 1:ncol(ref))
    {
      currentweb[,i] <- (pc[i]+pr)/2
    }
  } else {
    for(i in 1:nrow(ref))
    {
      currentweb[i,] <- (pr[i]+pc)/2
    }
  }
  return(apply(currentweb,margin,function(x)rbinom(length(x),1,x)))
}

To leave a comment for the author, please follow the link and comment on their blog: Timothée Poisot » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)