Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.
For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel package to speed up the process.
Here's a quick example on how to read in raster data (in this case 460 global sea surface temperature files of 1° x 1° degree resolution) using parallel
First, lets do it the conventional way and see how long that takes
library(raster)
library(rgdal)
### Input preparation ########################################################
inputpath <- "/media/tims_ex/sst_kili_analysis"
ptrn <- "*sst_anom_pcadenoise_*_R2.rst"
### list files in direcotry ##################################################
fnames_sst_r2 <- list.files(inputpath,
pattern = glob2rx(ptrn),
recursive = T)
### read into raster format ##################################################
system.time({
sst.global <- lapply(seq(fnames_sst_r2), function(i) {
raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
}
)
})
## user system elapsed ## 61.584 0.412 68.104
Now using library(parallel)
library(parallel)
system.time({
### set up cluster call ######################################################
cl <- makePSOCKcluster(4)
clusterExport(cl, varlist = c("inputpath", "fnames_sst_r2"),
envir=environment())
junk <- clusterEvalQ(cl, c(library(raster),
library(rgdal)))
### read into raster format using parallel version of lapply #################
sst.global.p <- parLapply(cl, seq(fnames_sst_r2), function(i) {
raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
}
)
### stop the cluster #########################################################
stopCluster(cl)
})
## user system elapsed ## 0.152 0.080 25.670
Not a crazy speed enhancement, but we need to keep in mind that the raster command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.
Finally, let's test whether the two methods produce identical results.
identical(sst.global.p, sst.global)
## [1] TRUE
to be continued…
sessionInfo()
## R version 2.15.3 (2013-03-01) ## Platform: x86_64-pc-linux-gnu (64-bit) ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=C LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] rgdal_0.8-5 raster_2.0-41 sp_1.0-5 knitr_1.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 grid_2.15.3 ## [5] lattice_0.20-13 stringr_0.6.2 tools_2.15.3
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

Zero Inflated Models and Generalized Linear Mixed Models with R.
Zuur, Saveliev, Ieno (2012).