RomanM’s Method

Posted on July 22, 2011 by Steven Mosher in R bloggers | 0 Comments

[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve succeeded in getting a version of RomanM and JeffId’s Thermal hammer working with version 1.3 of RghcnV3. This is going to be a long post because there is a lot of ground to cover. First, some errata, the “Globe” demo in V1.3 appears to have a missing line, looks like an editor bug, so I will have to push 1.4 to CRAN. CRAN has not been building packages for a bit, so I’ll post 1.4 here.

Roman’s method. Roman’s method is covered at a variety of posts at his site and at Jeffs. My goal here isn’t to explain his method in detail but rather to show how it will fit into RghcnV3, Roman’s method works by using all the information in all the time series to come up with an average of all the time series. The function is currently called ”temp.combine(), however that name will change as we move it into RghcnV3. I will also rename the function that uses Tamino’s method. As it stands temp.combine() takes a “mts” object or multiple time series object and a series of weights. It outputs a single time series, offsets for all time series, predicted values for all time series and residuals. For the first iteration I will leave the weights in. In discussions with Roman he indicated that these weights could be used for distance weighting, but we won’t we using that in today’s work.

As I plowed through the code it became apparent to me that the paradigm of holding station data in columns ( the method used by me, roman and tamino) actually has some drawbacks especially when it comes to combining stations with ‘aggregate()’. I’m not in the mood to change a bunch of code from column based to row based; and, its not simply a matter of doing a few transpositions. Practically what this means is that some of the code used in Roman’s method will use loops. I’m actively working to remove that, but loops work, so working code wins the day.

Plan of attack:

The goal for this exercise is to use Roman’s method to calculate an average for a grid cell. In the CAM method we use a simple ‘mean’ to calculate the average for a grid cell. That happens in the function “rasterizeZoo()”. That function takes an inventory of stations and a 2 dimensional Zoo object and “rasterizes” stations into the grid structure of a 3D brick. A brick is lat/lon/time. Some side notes on rasterizeZoo(). RasterizeZoo() calls the raster function “rasterize”. Rasterize need the following inputs: a matrix of xy points (station lon/lat), a base raster to use for every layer, a matrix of values to rasterize into cells, and a function. The matrix has to have time in columns, stations in rows. So we feed it a zoo object and we transpose that zoo object ’under the hood’ inside the function. The lastest version of zoo just added a transpose function. arrg I needed that a couple months back. But again, I’m not rewriting a bunch of code, just yet.

Our plan then will be to call a function like rasterizeZoo(). However, it will be slightly different. RasterizeZoo() takes multiple stations and averages them per cell. When we use Roman’s method we will do this ‘averaging’ BEFORE we call ‘rasterize’. Essentially we will call “rasterize” with “cells” rather than “points”. We could also use the raster function “setValues()” but that function needs a little tweak to allow this, so I’ll work with rasterize. That’s our goal. Get a zoo data structure and associated CELLS (not points) to feed to rasterize. Our first function must then take a mutiple time series, divide the stations into the right CELLS, and then romanize those stations to come up with a cell time series.

Here we go:

averageByCell <- function(r =GLOBE5, inventory, Mts){
  # check classes code goes here
  # check to make sure that all the inputs are the right classes. To do.
  # get the cell number associated with each station. xy maps to a cell in the raster
  myCells <-  cellFromXY(r, getLonLat(inventory))
       # a obscure hack to get the stations associated with a cell!
       # used to get the column numbers for each cell
       # recall that every column is a station. so we will use column numbers to subset
  DF <- data.frame(Id =getStations(inventory), Cell = myCells)

  # now we get a cell count. we are going to loop over the unique cells
  cellcount <- length(unique(myCells))
  # we need the unique cells in a vector for referencing and for NAMING output columns
  uCells <- unique(myCells)
 # loop over cells. IF mts were transposed we could use aggregate
 # but that would entail rewriting all of romans code.. or maybe use mapply
  for( cell in 1:cellcount){
       # dex will hold the positions of stations in DF$Cell. its also the columns we need
       dex <- which(DF$Cell == uCells[cell])
       # call temp.combine NOTE I can also call taminos here or a simple mean
       CellAve <- temp.combine(Mts[ ,dex])
       if (cell == 1){
          out <-  CellAve$temps
        }else {
          out <- cbind(out,CellAve$temps)
          # COPY the cell number to the column name! Critical
          colnames(out)<-uCells[1:cell]
         }
   }
  # make it a zoo object! we want to pass a zoo object to rasterizeZoo()
  Zout <- as.zoo(out)
  # put out the xy for every cell. This will probably be dropped in production
  xy   <- xyFromCell(r,uCells)
  return(list(Points=xy,Zoo=Zout))

}

So that’s it. We have a function that we feed the following values: a mts object. an inventory and a blank raster. That function will figure out which stations belong in which cells. It will then call “temp.combine()” with SUBSETS of the full mts object. temp.combine(Mts[,dex]) uses “dex” to select those stations that have matching cell numbers. So temp.combine works on a grid cell at a time. Then all those individual time series ( grid series) are collected into a zoo object that has CELL NUMBERS as column names. This is different from zoo objects that have station Ids as column names.

Now that we have a zoo object of averaged grid cells, we will call a function like “rasterizeZoo” but for clarity I’ll create a different function. I’ll probably come up with a special class for this and have just once function, but for now we have two. Here is that function.

rasterizeCells <- function( ZooCells, r = GLOBE5){
  require("zoo")
  require("raster")
  # check the various calling parameters
  # check for a good cellsize

  #check Zoo as a zoo
  # stop if they dont match
  # proceeding otherwise
  if (!is.zoo(ZooCells)){
    cat("Zoo is a ", class(ZooCells), "\n" )
    stop("Zoo must be a zoo object")
  }
  # now we should check that the cells in the column names can be found in
  # the raster. What we really need is a tighter coupling so we KNOW the resolution matches: OOP someday
  cells <- 1:ncell(r)
  zoo.cells <- as.integer(colnames(ZooCells))
  common    <- intersect(zoo.cells,cells)

  matching <- identical(intersect(zoo.cells,cells),zoo.cells)

  if (matching == FALSE){
    print(" mismatching cells between Zoocell and raster")
    stop("check your Raster")

  }
  if (!class(r)[1] == "RasterLayer") stop("r must be a raster object")
  resolution <- res(r)
  if (resolution[1] != resolution[2])stop(" must use a raster with square cells")

  ###########  ready to process
  #  get the lon lats of Cells
  #  suck the time out of the zoo object
  #  transform the zoo object into a matrix with right orientation
  #  rasterize with mean. Need to see if we can change rasterize to merely copy.
  # also replace this code with zoo  transform.. requires LATEST zoo library, so change depends

  xy      <- xyFromCell(r,zoo.cells)
  TIME    <- time(ZooCells)
  # core data wacks the row names
  data    <- coredata(ZooCells)
  dimnames(data)[[1]] <- TIME
  tdata   <- t(data)
  gridded <- rasterize(xy, r, field = tdata, fun = mean, na.rm = TRUE)
  gridded <- setZ(gridded, as.yearmon(time(ZooCells)), name = 'time')
  return(gridded)

}

That’s it. We now will be able to call readV3data(), v3ToMts(), averageByCell(), rasterizeCells() and the result will be a brick where every grid cell has the average temperature as computed by Roman’s method. Slick.

So I did a little test with Texas. Here is what a month of texas data looks like: Please note the nasty image bug with R.2.13.1. yuk. Note also that I used a 3×3 grid to do this.

And if you compare the answer you get with and without grid based averaging