reading raster data using library(parallel)

March 3, 2013
By

(This article was first published on metvurst, and kindly contributed to R-bloggers)

Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.

For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel package to speed up the process.

Here's a quick example on how to read in raster data (in this case 460 global sea surface temperature files of 1° x 1° degree resolution) using parallel

First, lets do it the conventional way and see how long that takes

library(raster)
library(rgdal)

### Input preparation ########################################################
inputpath <- "/media/tims_ex/sst_kili_analysis"
ptrn <- "*sst_anom_pcadenoise_*_R2.rst"

### list files in direcotry ##################################################
fnames_sst_r2 <- list.files(inputpath, 
                            pattern = glob2rx(ptrn), 
                            recursive = T)

### read into raster format ##################################################
system.time({
  sst.global <- lapply(seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                       )
  })
##    user  system elapsed 
##  61.584   0.412  68.104

Now using library(parallel)

library(parallel)

system.time({
  ### set up cluster call ######################################################
  cl <- makePSOCKcluster(4)

  clusterExport(cl, varlist = c("inputpath", "fnames_sst_r2"), 
                envir=environment())
  junk <- clusterEvalQ(cl, c(library(raster),
                             library(rgdal)))

  ### read into raster format using parallel version of lapply #################
  sst.global.p <- parLapply(cl, seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                          )

  ### stop the cluster #########################################################
  stopCluster(cl)
  })
##    user  system elapsed 
##   0.152   0.080  25.670

Not a crazy speed enhancement, but we need to keep in mind that the raster command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.

Finally, let's test whether the two methods produce identical results.

identical(sst.global.p, sst.global)
## [1] TRUE

to be continued…

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] rgdal_0.8-5   raster_2.0-41 sp_1.0-5      knitr_1.1    
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3    evaluate_0.4.3  formatR_0.7     grid_2.15.3    
## [5] lattice_0.20-13 stringr_0.6.2   tools_2.15.3

To leave a comment for the author, please follow the link and comment on his blog: metvurst.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.