read raster data in parallel

[This article was first published on metvurst, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Use library(parallel) to read raster data in parallel fashion

Use library(parallel) to read raster data in parallel fashion

Recently, I have been doing some analysis for a project I am involved in. In particular, I was interested what role pacific sea surface temperatures play with regard to rainfall in East Africa. I spare you the details as I am currently writing all this up into a paper which you can have a look at once published.

For this analysis, however, I am processing quite an amount of raster files. This led me to investigate the possibilities of the parallel package to speed up the process.

Here's a quick example on how to read in raster data (in this case 200 global sea surface temperature files of 1° x 1° degree resolution) using parallel

First, lets do it the conventional way and see how long that takes

system.time({

    library(raster)
    library(rgdal)

    ### Input preparation
    ### ########################################################
    inputpath <- "E:/sst_kili_analysis/"
    ptrn <- "*sst_anom_pcadenoise_*_R2.rst"

    ### list files in direcotry
    ### ##################################################
    fnames_sst_r2 <- list.files(inputpath, pattern = glob2rx(ptrn), recursive = T)

    ### read into raster format
    ### ##################################################
    sst.global <- lapply(seq(fnames_sst_r2), function(i) {
        raster(paste(inputpath, fnames_sst_r2[i], sep = "/"))
    })
})

##    user  system elapsed 
##   31.37    4.43   36.50 

Now using library(parallel)

library(parallel)
system.time({

    ### Input preparation
    ### ########################################################
    inputpath.p <- "E:/sst_kili_analysis/"
    ptrn.p <- "*sst_anom_pcadenoise_*_R2.rst"

    ### list files in direcotry
    ### ##################################################
    fnames_sst_r2.p <- list.files(inputpath.p, pattern = glob2rx(ptrn.p), recursive = T)

    ### set up cluster call
    ### ######################################################
    cl <- makePSOCKcluster(4)

    clusterExport(cl, c("inputpath.p", "fnames_sst_r2.p"))
    junk <- clusterEvalQ(cl, c(library(raster), library(rgdal)))

    ### read into raster format using parallel version of lapply
    ### #################
    sst.global.p <- parLapply(cl, seq(fnames_sst_r2.p), function(i) {
        raster(paste(inputpath.p, fnames_sst_r2.p[i], sep = "/"))
    })

    ### stop the cluster
    ### #########################################################
    stopCluster(cl)
})

##    user  system elapsed 
##    1.40    3.03   13.34 

Not that much of a speed enhancement, but we need to keep in mind that the raster command does not read into memory. Hence, the speed improvements should be a lot higher once we start the calculations or plotting.

Finally, let's test whether the two methods produce identical results.

identical(sst.global.p, sst.global)

## [1] TRUE

to be continued…

sessionInfo()

## R version 2.15.1 (2012-06-22)
## Platform: x86_64-pc-mingw32/x64 (64-bit)
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
## [1] rgdal_0.7-12  raster_1.9-92 sp_0.9-99     knitr_0.6.3  
## 
## loaded via a namespace (and not attached):
##  [1] digest_0.5.2   evaluate_0.4.2 formatR_0.5    grid_2.15.1   
##  [5] lattice_0.20-6 parser_0.0-16  plyr_1.7.1     Rcpp_0.9.13   
##  [9] stringr_0.6    tools_2.15.1  

To leave a comment for the author, please follow the link and comment on their blog: metvurst.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)