Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.
For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel package to speed up the process.
Here's a quick example on how to read in raster data (in this case 460 global sea surface temperature files of 1° x 1° degree resolution) using parallel
First, lets do it the conventional way and see how long that takes
library(raster)
library(rgdal)
### Input preparation ########################################################
inputpath <- "/media/tims_ex/sst_kili_analysis"
ptrn <- "*sst_anom_pcadenoise_*_R2.rst"
### list files in direcotry ##################################################
fnames_sst_r2 <- list.files(inputpath,
pattern = glob2rx(ptrn),
recursive = T)
### read into raster format ##################################################
system.time({
sst.global <- lapply(seq(fnames_sst_r2), function(i) {
raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
}
)
})
## user system elapsed
## 61.584 0.412 68.104
Now using library(parallel)
library(parallel)
system.time({
### set up cluster call ######################################################
cl <- makePSOCKcluster(4)
clusterExport(cl, varlist = c("inputpath", "fnames_sst_r2"),
envir=environment())
junk <- clusterEvalQ(cl, c(library(raster),
library(rgdal)))
### read into raster format using parallel version of lapply #################
sst.global.p <- parLapply(cl, seq(fnames_sst_r2), function(i) {
raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
}
)
### stop the cluster #########################################################
stopCluster(cl)
})
## user system elapsed
## 0.152 0.080 25.670
Not a crazy speed enhancement, but we need to keep in mind that the raster command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.
Finally, let's test whether the two methods produce identical results.
identical(sst.global.p, sst.global) ## [1] TRUE
to be continued…
sessionInfo() ## R version 2.15.3 (2013-03-01) ## Platform: x86_64-pc-linux-gnu (64-bit) ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=C LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] parallel stats graphics grDevices utils datasets methods ## [8] base ## ## other attached packages: ## [1] rgdal_0.8-5 raster_2.0-41 sp_1.0-5 knitr_1.1 ## ## loaded via a namespace (and not attached): ## [1] digest_0.6.3 evaluate_0.4.3 formatR_0.7 grid_2.15.3 ## [5] lattice_0.20-13 stringr_0.6.2 tools_2.15.3
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
