Simulation studies in R – Using all cores and other tips

[This article was first published on Sustainable Research » Renglish, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

After working more seriously with simulations I noticed some updates were necessary to my previous setup. Most notably are the following three:

  • It is very handy to explicitly call the different scenarios instead of using nested loops
  • Storing intermediate results in single files obliviates the need to rerun an almost finished but crashed analysis and seperates very clearly the data-generation from analysis part.
  • Using all availible cores can speed up the processing time, but may render the simulation not reproducible.

So here is my new simulation-study sceleton, that consists of five parts:

  1. Praeamble: Load all the functions that are required
  2. Simulation-function: This is the part, that will most likely be much more complicated in your case. Define the steps that will be repeated for different scenarios. The parameters of this function will be filled in by the scencarios.
  3. Scenario-Description: Explicitly show the range of values that should be passed to the Simulation-Function
  4. Run the analysis: Here you pass all the scenario-descriptions to your simulation-function. Either do this on one or all availible cores. In any case you should set a random seed to make the simulatino reproducible.
  5. Analyze the outputs: Not shown here but You propabely

Here is the complete script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# 1. Praeamble
setwd("c:/temp/")
require(doSNOW)
require(rlecuyer)
 
# 2. Simulation-function
sim_fun<-function(a,b,c){
results<-matrix(NA, 1000,4)
for(i in 1:1000){
#a=1; b=2;c=3;i=1
results[i,1:3]<-cbind(a, b, c)
results[i,4]<-mean(rnorm(100))#THIS MAY BE MORE COMPLEX FOR YOU HEHE!
}
write.table(results, file=paste(a,"_", b,"_", c, "_res.csv"))  
}
 
# 3. Scenario-Description
a<-seq(10, 100, 20)
b<-seq(20, 100, 30)
c<-seq(30, 200, 40)
scenarios<-expand.grid(a, b, c)
 
# 4.a Run the analysis on one core
set.seed(29012001)
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
 
 
# 4.b Run the analysis on all availible Cores
cluster<-makeCluster(4, type = "SOCK")
clusterSetupRNG(cluster, seed = 29012001) 
registerDoSNOW(cluster)
 
foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
 
# compare the time
system.time(
for(i in 1:length(scenarios[,1])){sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
)
 
system.time(
foreach(i= 1:length(scenarios[,1])) %dopar% {sim_fun(scenarios[i,1], scenarios[i,2], scenarios[i,3])}
)

There are also other tutorials on how to run simulations in R. The one I liked most was Roger Koenkers’ “A simple protocoll for simulations in R” (accessible here) that relies more heavily on R’s built in features to solve some of the problems.

 

To leave a comment for the author, please follow the link and comment on their blog: Sustainable Research » Renglish.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)