Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the challenges for ecologists working with trophic/interaction networks is to understand their organization. One of the possible approaches is to compare them across a random model, with more or less constraints, in order to estimate the departure from randomness. To this effect, null models have been developed.

The basic idea behind a null model is to generate a network with properties that are the same that the original network, but with a different distribution of the interactions. That allows to test wether your network could have been generated by chance, although Bascompte and colleagues showed that in almost all the cases, the distribution of some key measures are significantly different from their generated random distribution.

There are various model of increasing complexity, but one that is particularly interesting is the one described in the aforementioned Bascompte paper. The probability of having an interaction at the position Pl,c (where l are the lines and c the columns) is given by $P_{l,c}=\frac{P_lP_c}{2}$

which as explained in the image below (behold my drawing abilities!) is the mean of the probability to have an interaction in the line and an interaction in the column (which, assuming the network is made entirely of 0 and 1, are the means of resp. the lines and the columns). In R, it is quite easy to do it, using the colMeans and consorts functions. The brute force approach it to use two for loops, to go through the lines and columns. This is horribly slow, however, and the very interest of using null models is the ability to simulate a truckload of data. The function I wrote starts by finding the minimal size of the network, on which it will iterate (if there are less columns than rows, we will use a for loop on the columns). That way, we can construct the probability matrix, on which it is easier to use apply (and we already know the good margin!) to get the new network.

And it is way faster than the double-for-loop method, as showed in the below picture. The circles are the non optimized function, and the triangles are the new one. Just make sure to take a look at the axes: the new function is able to generate a network of size superior to 106 interactions (e.g. 1000 predators and 1000 preys) in roughly 0.25 seconds. This network size is highly unrealistic! Working with networks of roughly 400 interactions, I was able to generate 104 nulls in less than 6 seconds (on a unibody MacBook with 4Go RAM). The function is right here:

nullweb = function(ref)
{
ref[ref>0] <- 1
margin <- ifelse(ncol(ref)< nrow(ref),2,1)
currentweb <- matrix(0,ncol=ncol(ref),nrow=nrow(ref))
pc <- colMeans(ref)
pr <- rowMeans(ref)
if(margin==2)
{
for(i in 1:ncol(ref))
{
currentweb[,i] <- (pc[i]+pr)/2
}
} else {
for(i in 1:nrow(ref))
{
currentweb[i,] <- (pr[i]+pc)/2
}
}
return(apply(currentweb,margin,function(x)rbinom(length(x),1,x)))
}