reading raster data using library(parallel)

March 3, 2013
By

(This article was first published on metvurst, and kindly contributed to R-bloggers)

Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.

For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel package to speed up the process.

Here's a quick example on how to read in raster data (in this case 460 global sea surface temperature files of 1° x 1° degree resolution) using parallel

First, lets do it the conventional way and see how long that takes

library(raster)
library(rgdal)

### Input preparation ########################################################
inputpath <- "/media/tims_ex/sst_kili_analysis"
ptrn <- "*sst_anom_pcadenoise_*_R2.rst"

### list files in direcotry ##################################################
fnames_sst_r2 <- list.files(inputpath, 
                            pattern = glob2rx(ptrn), 
                            recursive = T)

### read into raster format ##################################################
system.time({
  sst.global <- lapply(seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                       )
  })
##    user  system elapsed 
##  61.584   0.412  68.104

Now using library(parallel)

library(parallel)

system.time({
  ### set up cluster call ######################################################
  cl <- makePSOCKcluster(4)

  clusterExport(cl, varlist = c("inputpath", "fnames_sst_r2"), 
                envir=environment())
  junk <- clusterEvalQ(cl, c(library(raster),
                             library(rgdal)))

  ### read into raster format using parallel version of lapply #################
  sst.global.p <- parLapply(cl, seq(fnames_sst_r2), function(i) {
    raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    }
                          )

  ### stop the cluster #########################################################
  stopCluster(cl)
  })
##    user  system elapsed 
##   0.152   0.080  25.670

Not a crazy speed enhancement, but we need to keep in mind that the raster command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.

Finally, let's test whether the two methods produce identical results.

identical(sst.global.p, sst.global)
## [1] TRUE

to be continued…

sessionInfo()
## R version 2.15.3 (2013-03-01)
## Platform: x86_64-pc-linux-gnu (64-bit)
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=C                 LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] rgdal_0.8-5   raster_2.0-41 sp_1.0-5      knitr_1.1    
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.3    evaluate_0.4.3  formatR_0.7     grid_2.15.3    
## [5] lattice_0.20-13 stringr_0.6.2   tools_2.15.3

To leave a comment for the author, please follow the link and comment on their blog: metvurst.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)