Simulation studies in R – Using all cores and other tips

September 16, 2011
By

(This article was first published on Sustainable Research » Renglish, and kindly contributed to R-bloggers)

After working more seriously with simulations I noticed some updates were necessary to my previous setup. Most notably are the following three:

  • It is very handy to explicitly call the different scenarios instead of using nested loops
  • Storing intermediate results in single files obliviates the need to rerun an almost finished but crashed analysis and seperates very clearly the data-generation from analysis part.
  • Using all availible cores can speed up the processing time, but may render the simulation not reproducible.

So here is my new simulation-study sceleton, that consists of five parts:

  1. Praeamble: Load all the functions that are required
  2. Simulation-function: This is the part, that will most likely be much more complicated in your case. Define the steps that will be repeated for different scenarios. The parameters of this function will be filled in by the scencarios.
  3. Scenario-Description: Explicitly show the range of values that should be passed to the Simulation-Function
  4. Run the analysis: Here you pass all the scenario-descriptions to your simulation-function. Either do this on one or all availible cores. In any case you should set a random seed to make the simulatino reproducible.
  5. Analyze the outputs: Not shown here but You propabely

Here is the complete script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 1. Praeamble
setwd("c:/temp/")
require(doSNOW)
require(rlecuyer)
 
# 2. Simulation-function
sim_fun<-function(a,b,c){
results<-matrix(NA, 1000,4)
for(i in 1:1000){
#a=1; b=2;c=3;i=1
results[i,1:3]<-cbind(a, b, c)
results[i,4]<-mean(rnorm(100))#THIS MAY BE MORE COMPLEX FOR YOU HEHE!
}
write.table(results, file=paste(a,"_", b,"_", c, "_res.csv"))  
}
 
# 3. Scenario-Description
a<-seq(10, 100, 20)
b<-seq(20, 100, 30)
c<-seq(30, 200, 40)
scenarios<-expand.grid(a, b, c)
 
# 4.a Run the analysis on one core
set.seed(29012001)
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
 
 
# 4.b Run the analysis on all availible Cores
cluster<-makeCluster(4, type = "SOCK")
clusterSetupRNG(cluster, seed = 29012001) 
registerDoSNOW(cluster)
 
foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
 
# compare the time
system.time(
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
)
 
system.time(
foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
)

There are also other tutorials on how to run simulations in R. The one I liked most was Roger Koenkers’ “A simple protocoll for simulations in R” (accessible here) that relies more heavily on R’s built in features to solve some of the problems.

 

To leave a comment for the author, please follow the link and comment on his blog: Sustainable Research » Renglish.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.