Forking Myself

July 29, 2011

(This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers)

I’ve spent some time forking myself. Over the past few days when I could steal away an hour here  or there I decided to make a big change to the package. But it’s a good change. First some book-keeping. The Romantest.R file has a minor bug in it. Not really a bug, I just pulled out the wrong statistic from the brick ” cellStats(..,sum),  should be cellStats(…mean) unless you want to errr “add” all the cells together. Too eager to post.

On with the forking. Nick Stokes and I exchanged a few emails looking at various ways of speeding up readV3Data().  During the course of that we found a small “bug” to report to NCDC. As I thought more about his data structure, a 3D array of stations, months, and years it became obvious that the whole package could be easily rewritten to support it. And I could bury forever the old 14 column v2 format. The way I’ve implemented that is an option on the readV3data() function. You specify which kind of object you want returned: a zoo object, mts object or an Array type object. Zoo and mts are just variants of each other with station data in columns and time in rows. Then I defined a series of functions asZoo(), asMts(), asArray(), which allow you to switch from one representation to another. Then I rewrote all the functions to accept any type of object. This cut down the number of actual functions I need and moved me closer to OOP. A couple of the functions, still accept only one type of data, for example Romans and Taminos, are still “mts/zoo” centric.  Some routines disappeared all together: windowV3() is gone and windowArray() takes it place, although you could do  test <- asArray(window(asZoo(Data),start =1920, end = 1940 +11/12).

so you can now  write  anomalize() and pass it a zoo, mts or Array. And  passesCam() replaces the several variants.  The demos have been rewritten and I added a couple test programs. Building a new package will take time as the manual will require much rewriting.

Then I had another little task. Rewriting Nick’s  weights program. The function is used to calculate a series of weights for the data array. These weights are a function of the monthly count of reports in a grid cell and an area weight for that gridcell. My main goal here was to see if I could rewrite the code using raster. Well, I did, but it wasn’t entirely clear and last bits of it were particularly nasty so I decided instead to stick with Nicks code, add some clearer variable names, and change it so that you can use any size grid cell. Here is what the algorithm does: It takes a all the temperatures series and  gives them a “cell” id,  exactly like the cell value in a raster. Every cell receives a weight that is proportional to its area, basically sin() of Latitude so that cells at the equatoer which are larger have a weight of 1 and cells toward the poles are weighted less. Then a “count” is established for every valid temperature measure in the cell for every month of every year.  So a given cell could have 5 reports one month and no reports the next month or 3 or 2, and so forth. Then the area weight is divided by the counts in that cell. And I refer to this as “inverse density”  the more reports, the less each report is weighted. The smaller the area, the smaller the weight.  As the function stands now, I’ve added a  raster as a calling parameter. I’ll probably change that to simply a cell size, as everything I need can be calculated from a cell size.  Here is the code. This function, of course, doesnt take a zoo or mts as a parameter, but that’s relatively easy to handle with the asArray() function, so I’ll probably change that as well before adding it.

inverseDensity <- function(inv, Data, r = GLOBE5){

rClass <- class(r)[1]
isRL <- rClass == "RasterLayer"
if (isInventory(inv) & isArray(Data) & isRL){
# check that inventory stations and Data stations match
if (!identical(getStations(inv), getStations(Data))) stop("stations must match")
# dimensions of Array
dimensions <- dim(Data)
# array of weights for output that looks like temperatures
weights <- array(NA,dimensions)

lonBin <- ncol(r)
latBin <- nrow(r)
halfLat <- latBin/2
eq <- halfLat * (lonBin +1) + (halfLat+1)
cellSize <- res(r)[1]
cellId <- floor(inv$Lat/cellSize) * lonBin + floor(inv$Lon/cellSize) + eq
#weights for latitude changes
latWeight <- sin((1:latBin-0.5)*pi/latBin)
#weights for every lat/lon
allWeights <- rep(latWeight, each = lonBin)
for(months in 1:dimensions[2]){
      for(years in 1:dimensions[3]){
      # get T/F if temperature is there
      reports <- ![ , months, years])
      # get the cell numbers of those cells with reports
      cells <- cellId[reports]
      # a count of reports per cell per month
      counts <- tabulate(c(cells, lonBin * latBin))
      # Ok is the logical of counts > 0
      ok <- counts > 0
      # make a copy of allweights to modify
      density <- allWeights
      # note only "ok" elements are valid
      density[ok] <- allWeights[ok] / counts[ok]
      #print(density[ok]) those True cells get a density area/count
      weights[reports, months, years] <- density[cells]


if ( !isInventory(inv) ) stop(" must be a valid inventory")
if ( !isArray(Data)) stop("must be a valid data array")
if ( !isRL ) stop("must be a valid raster")


To leave a comment for the author, please follow the link and comment on their blog: Steven Mosher's Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...


Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)