Vanilla C code for the Stochastic Simulation Algorithm

Posted on October 24, 2011 by Mario Pineda-Krch in R bloggers | 0 Comments

[This article was first published on Mario's Entangled Bank » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Gillespie stochastic simulation algorithm (SSA) is the gold standard for simulating state-based stochastic models. If you are a R buff, a SSA novice and want to get quickly up and running stochastic models (in particular ecological models) that are not overly computationally demanding you might want to consider the GillespieSSA package. The chief advantage of GillespieSSA is that you will be up and running in a whiff. The main disadvantage is that you will be running slow. If you want more horsepower (as in faster simulations) you might want to look at StochKit (Stochastic Simulation Kit) which is an extensible stochastic simulation framework developed in C++ that aims to make stochastic simulation accessible to practicing biologists and chemists, while remaining open to extension via new stochastic and multiscale algorithms. While StochKit itself is in C++ the model definitions are in XML you can therefore remain blissfuly ignorant of C++…, but, you do need to have a C++ compiler installed (bye-bye Windoze). I have used StochKit extensively in the past and the XML model definitions (that were introduced in recent versions) really simplify and speed up setting up models. Unfortunately it has also made StochKit rely on having other C++ libraries installed (e.g. libxml2) thus complicating the setup and installation. While I had it all set up and humming along in Snow Leopard it turns out that the recent Lion upgrade utterly butchered my StochKit set up, among other things (Why Steve, why?). Anyway, to make a long story short. After spending half a day trying to reinstall StochKit and making all its dependancies play nice the cat collapsed on top of the StochKit manual in pure exhaustion.

StochKit OD

Because the cat is wise beyond her years I took this as an indication that it was time to break-up my affair with StochKit and move on to greener pastures. So if GillespieSSA and StochKit are out of the game, what options remain? Well, there is a whole slew of other software packages and libraries implementing various versions of the SSA. How do you decide? In moments like this seeking advance from someone infinitely wise seems to be a good idea. Since the cat was still out I turned to my next best source of wisdom – Albert Einstein. The simplest possible SSA implementation I am able to envision is a vanilla C code of Gillespie’s original Direct method, no frills, no dependencies, no whistles and bells. The next step was obviously to ask Google to find it for me…, but alas yet another snag. There simply does not seem to exist a vanilla C code of the SSA (but I am probably wrong). So the next best thing is of course to whip it up on your own. So to fix this glaring omission of the World Wide Web I am hereby posting the very first SSA in vanilla C. A few caveats about the code. Vanilla C means that there are absolutely no whistles and bells, i.e. it does not require any other libraries (e.g. GSL), probably does not use a fancy Mersenne Twister and it does no error checking, but it should compile right out of the box using

gcc -o ssa_lvpp ssa_lvpp.c

but something like

gcc -O3 -funroll-loops -ftree-vectorize -fast -floop-optimize -fstrength-reduce -o ssa_lvpp ssa_lvpp.c

could give marginally better performance (I have not checked). The only other requirement I needed was that the code had to be as simple, brief and as legible as possible and, oh, fast! Just for fun, here is the corresponding model definition using GillespieSSA

parms <- c(b=2, d=1, g=2.5, K=2000, c=5, a=5e-04 )

x0   <- c(N=1000, P=1000)
nu   <- matrix(c(+1, -1, 0,  0,
                  0,  0, 1, -1),nrow=2,byrow=T)
a    <- c("b*N", "d*N+(b-d)/K*N*N + a*P*N","c*a*P*N", "g*P")
tmax <- 100

and to run it do

out <- ssa(x0,a,nu,parms,tmax)

While this is somewhat shorter and easier on the eyes than the C code (albeit vanilla) the GillespieSSA implementation bites the dust in terms of computational performance. The above GillespieSSA version was still at it after 10 minutes, at which point my patience ran out and I terminated it. The same model (and same parameters) using my the C code finished in less than 1 second. That is (at least) 3 (going at 4) orders of magnitude faster people! Now, in all honesty, this large discrepancy in the computational performance is not entirely surprising. Not only will any R code be inevitable slower than the same code in C but GillespieSSA was never made to be fast (yes, you read right), it was made to be simple to use (which it is). A smart alec may perhaps remark that performance and simplicity of use are not mutually exclusive, to which I respond, “Such is life”. In contrast, my vanilla code was explicitly made with speed in mind. So there. By midnight (I started late) the cat was still passed out on top of the StocKit installation instructions and I had finished the first version of my vanilla C code aka ssa_lvpp.c. lvpp stands for Lotka Volterra Predator-Prey model and is the stochastic implementation of the classical Lotka Volterra predator-prey model where the prey has logistic growth and the predator has a linear functional response, $\frac{\displaystyle dN}{\displaystyle dt} = rN\left(1-\frac{\displaystyle N}{\displaystyle K}\right) - cNP$

$\frac{\displaystyle dP}{\displaystyle dt} = c*a*N*P-g*P$

where the interpretation of the various parameters is left as an exercise to the reader. Without further ado, the code is

// gcc -o ssa_lvpp ssa_lvpp.c
// gcc -O3 -funroll-loops -ftree-vectorize -fast -floop-optimize -fstrength-reduce -o ssa_lvpp ssa_lvpp.c

#include <stdlib.h>
#include <stdio.h>
#include <math.h>

// Define the largest number of time points that can be recorded
#define MAX_DATA 100000

int main(int argc, char **argv)
{
	// Extract parameters from the command line
	double    b = strtod(argv[1], (char **) NULL); // Prey birth rate
	double    d = strtod(argv[2], (char **) NULL); // Prey density independent mortality rate
	double    K = strtod(argv[3], (char **) NULL); // Carrying capacity of the prey
	double    a = strtod(argv[4], (char **) NULL); // Predator attach rate
	double    c = strtod(argv[5], (char **) NULL); // Predator conversion efficiency
	double    g = strtod(argv[6], (char **) NULL); // Predator mortality rate
	double    N = strtod(argv[7], (char **) NULL); // Prey population size
	double    P = strtod(argv[8], (char **) NULL); // Predator population size
	double tmax = strtod(argv[9], (char **) NULL); // Length of simulation
	double    r = b-d;                             // Prey intrinsic growth rate

    // Check that tmax is not larger than the array used for storing results.
    // Here we assume that data is recorded each integer time step, i.e.
    // tmax=MAX_DATA is largest tmax we can use.
	//
    // Note: It would be more elegant to use a C++'s STL vector containers for
    // the data structure.
	if (tmax>(MAX_DATA-1)) {printf("tmax=%d\n",MAX_DATA-1); exit(-1);}

	// Print the parameters and their values
	printf("b=%lf, d=%lf, K=%lf, a=%lf, c=%lf, g=%lf, N=%lf, P=%lf, tmax=%lf\n",
	        b, d, K, a, c, g, N, P, tmax);

	// Calculate some constants to speed thing up
	double constant1 = r/K;
	double constant2 = c*a;

	double t = 0;  // Current time
	double tn = 0; // Next time point at which the state of the system will be recorded
	int row = 0;   // Row number in data array where next data will be recorded
	double data[MAX_DATA][3]; // Data array for recording the state of the system

	// Record the initial sate of the system
	data[row][0] = t;
	data[row][1] = N;
	data[row][2] = P;
	row++;

	// Seed the random number generator
	srand((unsigned)time(NULL));

	// Start the simulation
	do{
		// If it is time, record the state of the system
		if (t>=tn) {
			data[row][0] = t;
			data[row][1] = N;
			data[row][2] = P;
			row++;
			tn++;
		}

		// Determine the total rate of each event
		double prey_birth = b*N;
		double prey_death = d*N+constant1*N*N + a*P*N;
		double pred_birth = constant2*P*N;
		double pred_death = g*P;

		// Determine the total event rate
		double toto = prey_birth + prey_death + pred_birth + pred_death;

		// Determine the time step at which the next event occurs + update the time
		double r = (double)rand() / (double)RAND_MAX;
		t += -1/toto * log10(r);

		// Determine the next event to occur + update the populations
		r = (double)rand() / (double)RAND_MAX;
		double p = r * toto;
		double sum = prey_birth;
		if      (p <= prey_birth) { N++; sum += prey_death;}
		else if (p <= prey_birth+prey_death) { N--; sum += pred_birth;}
		else if (p <= prey_birth+prey_death+pred_birth) P++;
		else P--;

		// Terminate simulation if it has reached tmax or if all populations are extinct
	} while (t<=tmax && (N>0 && P>0));

	// Record the final state of the system
	data[row][0] = t;
	data[row][1] = N;
	data[row][2] = P;
	row++;

	// Send the recorded data to STDOUT
	printf("time, N, P\n");
	int i;
	for(i==0; i
}

To run this code with the same parameters as previously just do (remember there is no error checking so those arguments better be spot on)

./ssa_lvpp 2 1 2500 5e-04 5 2.5 1000 1000 100 1

Finally, if anyone by chance finds the code useful, feel free to use and do whatever you deem fit with it (e.g. modifying it for other models is straightforward). No license, no thanks, no attribution necessary and no blame will be accepted (but I would be happy to get comments). Since I wrote the code I have used it to run in excess of 60000 simulations for a research project (it took 24 hrs on a dinghy lap top, i.e. approximately 1.5 seconds per run if you need to know, now multiply that with 1000 to get an estimate of how long GillespieSSA would be at it). Disclaimer: the only reason I have no inhibitions trashing GillespieSSA is because I happen to be the author of it. Having said this, for the right models and circumstances it is actually a useful package, e.g. to experiment with SSA and wrap your head around how the SSA works. This is from the “Mario’s Entangled Bank” blog ( http://pineda-krch.com ) of Mario Pineda-Krch, a theoretical biologist at the University of Alberta. Filed under: computer simulations, computing, cookbook, Daniel Gillespie, Gillespie algorithm, GillespieSSA, R