read raster data in parallel
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Use library(parallel) to read raster data in parallel fashion
Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.
For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel
package to speed up the process.
Here's a quick example on how to read in raster data (in this case 200 global sea surface temperature files of 1° x 1° degree resolution) using parallel
First, lets do it the conventional way and see how long that takes
system.time({<br /><br /> library(raster)<br /> library(rgdal)<br /><br /> ### Input preparation<br /> ### ########################################################<br /> inputpath <- "E:/sst_kili_analysis/"<br /> ptrn <- "*sst_anom_pcadenoise_*_R2.rst"<br /><br /> ### list files in direcotry<br /> ### ##################################################<br /> fnames_sst_r2 <- list.files(inputpath, pattern = glob2rx(ptrn), recursive = T)<br /><br /> ### read into raster format<br /> ### ##################################################<br /> sst.global <- lapply(seq(fnames_sst_r2), function(i) {<br /> raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))<br /> })<br />})<br />
## user system elapsed <br />## 31.37 4.43 36.50 <br />
Now using library(parallel)
library(parallel)<br />system.time({<br /><br /> ### Input preparation<br /> ### ########################################################<br /> inputpath.p <- "E:/sst_kili_analysis/"<br /> ptrn.p <- "*sst_anom_pcadenoise_*_R2.rst"<br /><br /> ### list files in direcotry<br /> ### ##################################################<br /> fnames_sst_r2.p <- list.files(inputpath.p, pattern = glob2rx(ptrn.p), recursive = T)<br /><br /> ### set up cluster call<br /> ### ######################################################<br /> cl <- makePSOCKcluster(4)<br /><br /> clusterExport(cl, c("inputpath.p", "fnames_sst_r2.p"))<br /> junk <- clusterEvalQ(cl, c(library(raster), library(rgdal)))<br /><br /> ### read into raster format using parallel version of lapply<br /> ### #################<br /> sst.global.p <- parLapply(cl, seq(fnames_sst_r2.p), function(i) {<br /> raster(paste(inputpath.p, fnames_sst_r2.p[i], sep = "/"))<br /> })<br /><br /> ### stop the cluster<br /> ### #########################################################<br /> stopCluster(cl)<br />})<br />
## user system elapsed <br />## 1.40 3.03 13.34 <br />
Not that much of a speed enhancement, but we need to keep in mind that the raster
command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.
Finally, let's test whether the two methods produce identical results.
identical(sst.global.p, sst.global)<br />
## [1] TRUE<br />
to be continued…
sessionInfo()<br />
## R version 2.15.1 (2012-06-22)<br />## Platform: x86_64-pc-mingw32/x64 (64-bit)<br />## <br />## locale:<br />## [1] LC_COLLATE=English_United States.1252 <br />## [2] LC_CTYPE=English_United States.1252 <br />## [3] LC_MONETARY=English_United States.1252<br />## [4] LC_NUMERIC=C <br />## [5] LC_TIME=English_United States.1252 <br />## <br />## attached base packages:<br />## [1] parallel stats graphics grDevices utils datasets methods <br />## [8] base <br />## <br />## other attached packages:<br />## [1] rgdal_0.7-12 raster_1.9-92 sp_0.9-99 knitr_0.6.3 <br />## <br />## loaded via a namespace (and not attached):<br />## [1] digest_0.5.2 evaluate_0.4.2 formatR_0.5 grid_2.15.1 <br />## [5] lattice_0.20-6 parser_0.0-16 plyr_1.7.1 Rcpp_0.9.13 <br />## [9] stringr_0.6 tools_2.15.1 <br />
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.