RghcnV3 version 1.1

[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve just uploaded version 1.1 of  the package RghcnV3 to Cran. I’ve made a few changes that should make it easier for some folks to use. First I removed the requirement for rgdal. At the present time “rgdal” is not required. On the MAC installing it can be a little trouble, but if you RTFM you can get it done. I’ll look for ways to provide capability without “rgdal” but it will involve reformating several spatial datasets and that’s not always a simple task. Further I like to be able to allow the user to build everything from the data sources themselves. That’s not always possible, especially since some datasets are behind registration walls. That too can be handled using “RCurl”, however “RCurl” has also been a hassle for me on the MAC. That hassle is entirely my own screw ups in having outdated curl libs all over the place. The other change I made was to change the R requirements from 2.13 back to 2.11. There was a bug in the demo code ( how’d that happen?) that has been fixed.

New features in 1.1

windowV3: a function to select windows of years from the V3 data

downloadSST, readSST: functions to download and read HADSST2 data.

intersectInvAnomalies: a minor change to the output of the function to keep anomalies as a zoo object.

rasterizeAnomaly: a function to “grid” anomalies into a 3D raster object.

some other tidbits here and there, more demo code as well.

The most important addition is the “rasterizeAnomaly” function. This function hides all the nastiness of projecting a zoo object of anomalies into a raster brick. I’ll show you the code and explain what the nasty bits are.

rasterizeAnomaly<-function(inventory,anomaly,land=GLOBE5){
  require("zoo")
  require("raster")
  # check the various calling parameters

  # check inventory as inventory data.frame
  #check anomaly as a zoo
  # stop if they dont match
  # proceeding otherwise
  if(!is.zoo(anomaly)){
    print(cat("anomaly is a ",class(anomaly),"\n" ))
    stop("anomaly must be a zoo object")
  }
  if(!is.data.frame(inventory)){
    print(cat("inventory is a ",class(inventory),"\n" ))
    stop("inventory must be a data.frame")
  }
  if(names(inventory)[1]!="Id"){
    print("inventory column 1 must be Id")
    stop(cat("column 1 name is ",names(inventory)[1],"\n"))
  }
  if(names(inventory)[2]!="Lat"){
    print("inventory column 2 must be Lat")
    stop(cat("column 2 name is ",names(inventory)[2],"\n"))
  }
  if(names(inventory)[3]!="Lon"){
    print("inventory column 3 must be Lon")
    stop(cat("column 3 name is ",names(inventory)[3],"\n"))
  }

  matching <- identical(getStations(inventory),getStations(anomaly))
  if(matching==FALSE){
    print(" mismatching stations between inventory and anomalies")
    stop("use the function intersectInvAnomaly to rectify")

  }
  if(!class(land)[1] == "RasterLayer") stop("land must be a raster object")
  resolution <- res(land)
  if(resolution[1]!=resolution[2])stop(" must use a land mask with square cells")

  ###########  ready to process
  #  get the lon lats of stations
  #  suck the time out of the zoo object
  #  transform the zoo object into a matrix with right orientation
  #  rasterize with mean
  #test <- rasterize(xy,r,field=vals,fun=mean)
  xy <- getLonLat(inventory)
  TIME <- time(anomaly)
  # core data wacks the row names
  data <- coredata(anomaly)
  dimnames(data)[[1]]<-TIME
  tdata<-t(data)
  gridded <- rasterize(xy,land,field=tdata,fun = mean,na.rm=TRUE)
  return(gridded)

}

As you can see we do some checking to insure that  the inputs are as required. Essentially, if you feed the out put of intersectInvAnomalies() to the rasterizeAnomaly() function, then everything will be in place.

Then we do the following.

we get the stations Lon/lat data from the inventory.

we extract the time from the anomaly zoo object. This is going to be used as layer names in the final raster brick. We have to extract it before doing any other operations on the zoo object because those operations will drop the data we need.

we pull the coredata from the zoo object. When you do this, you lose the time data. at this point data is a matrix

then we  put the time data back into the row names of the data matrix. Thus, we have a matrix where the row names are TIME and the column names are station Ids.  To use the rasterize() function in raster, this has to change. rasterize works like this:

You have a 2 column data structure of Lon/Lat.  Every Row is a new pair of points.  If you had 1000 stations, you would have 1000 rows of data. The matrix, on the other hand, has 1000 columns. A column per station. And the rows represent time. The solution is just to transpose the matrix t(). after that you have stations in rows and columns represent times. When this is feed to rasterize, the points that lie in the same grid square are going to have a function applied: mean. That is, if 3 stations lie with a grid square, the final value of that grid will be the mean of those three. After rasterizing you have a raster brick. The brick will have layers for every month.  The layerName of the layer will be the date in fractional years. 100 years = 1200 layers.  That brick is returned to the caller. You can then plot the layers of that brick, aggregate them into years or decades, take the mean of the layer, etc. More on that in the next release where will look at area averaging and working with SST data.  I’ll also add some other convenience functions for doing things in zoo series. I have requests in for different styles of summarizing years, and seasons. That’s all pretty straightforward in zoo, but getting fluent in zoo takes a little time. so I may just wrap some zoo processing into simple functions.  That code is done ( from long ago). I’d arther not do too much of that, my preference would be to extend zoo and extend raster as packages, but that is going to require some OOP work.  Maybe later in 2.0. The big issue there is one package is S3 OOP and the other is S4. That’s not an issue, just a learning curve.

The package should hit CRAN in a couple days ( source first, binaries later)


To leave a comment for the author, please follow the link and comment on their blog: Steven Mosher's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)