# Forking Myself

July 29, 2011
By

(This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers)

I’ve spent some time forking myself. Over the past few days when I could steal away an hour here  or there I decided to make a big change to the package. But it’s a good change. First some book-keeping. The Romantest.R file has a minor bug in it. Not really a bug, I just pulled out the wrong statistic from the brick ” cellStats(..,sum),  should be cellStats(…mean) unless you want to errr “add” all the cells together. Too eager to post.

On with the forking. Nick Stokes and I exchanged a few emails looking at various ways of speeding up readV3Data().  During the course of that we found a small “bug” to report to NCDC. As I thought more about his data structure, a 3D array of stations, months, and years it became obvious that the whole package could be easily rewritten to support it. And I could bury forever the old 14 column v2 format. The way I’ve implemented that is an option on the readV3data() function. You specify which kind of object you want returned: a zoo object, mts object or an Array type object. Zoo and mts are just variants of each other with station data in columns and time in rows. Then I defined a series of functions asZoo(), asMts(), asArray(), which allow you to switch from one representation to another. Then I rewrote all the functions to accept any type of object. This cut down the number of actual functions I need and moved me closer to OOP. A couple of the functions, still accept only one type of data, for example Romans and Taminos, are still “mts/zoo” centric.  Some routines disappeared all together: windowV3() is gone and windowArray() takes it place, although you could do  test <- asArray(window(asZoo(Data),start =1920, end = 1940 +11/12).

so you can now  write  anomalize() and pass it a zoo, mts or Array. And  passesCam() replaces the several variants.  The demos have been rewritten and I added a couple test programs. Building a new package will take time as the manual will require much rewriting.

Then I had another little task. Rewriting Nick’s  weights program. The function is used to calculate a series of weights for the data array. These weights are a function of the monthly count of reports in a grid cell and an area weight for that gridcell. My main goal here was to see if I could rewrite the code using raster. Well, I did, but it wasn’t entirely clear and last bits of it were particularly nasty so I decided instead to stick with Nicks code, add some clearer variable names, and change it so that you can use any size grid cell. Here is what the algorithm does: It takes a all the temperatures series and  gives them a “cell” id,  exactly like the cell value in a raster. Every cell receives a weight that is proportional to its area, basically sin() of Latitude so that cells at the equatoer which are larger have a weight of 1 and cells toward the poles are weighted less. Then a “count” is established for every valid temperature measure in the cell for every month of every year.  So a given cell could have 5 reports one month and no reports the next month or 3 or 2, and so forth. Then the area weight is divided by the counts in that cell. And I refer to this as “inverse density”  the more reports, the less each report is weighted. The smaller the area, the smaller the weight.  As the function stands now, I’ve added a  raster as a calling parameter. I’ll probably change that to simply a cell size, as everything I need can be calculated from a cell size.  Here is the code. This function, of course, doesnt take a zoo or mts as a parameter, but that’s relatively easy to handle with the asArray() function, so I’ll probably change that as well before adding it.

inverseDensity <- function(inv, Data, r = GLOBE5){

rClass <- class(r)[1]
isRL <- rClass == "RasterLayer"
if (isInventory(inv) & isArray(Data) & isRL){
# check that inventory stations and Data stations match
if (!identical(getStations(inv), getStations(Data))) stop("stations must match")
# dimensions of Array
dimensions <- dim(Data)
# array of weights for output that looks like temperatures
weights <- array(NA,dimensions)

lonBin <- ncol(r)
latBin <- nrow(r)
halfLat <- latBin/2
eq <- halfLat * (lonBin +1) + (halfLat+1)
cellSize <- res(r)[1]
cellId <- floor(inv$Lat/cellSize) * lonBin + floor(inv$Lon/cellSize) + eq
#weights for latitude changes
latWeight <- sin((1:latBin-0.5)*pi/latBin)
#weights for every lat/lon
allWeights <- rep(latWeight, each = lonBin)
for(months in 1:dimensions[2]){
for(years in 1:dimensions[3]){
# get T/F if temperature is there
reports <- !is.na(Data[ , months, years])
# get the cell numbers of those cells with reports
cells <- cellId[reports]
# a count of reports per cell per month
counts <- tabulate(c(cells, lonBin * latBin))
# Ok is the logical of counts > 0
ok <- counts > 0
# make a copy of allweights to modify
density <- allWeights
# note only "ok" elements are valid
density[ok] <- allWeights[ok] / counts[ok]
#print(density[ok]) those True cells get a density area/count
weights[reports, months, years] <- density[cells]

}

}
return(weights)
}
if ( !isInventory(inv) ) stop(" must be a valid inventory")
if ( !isArray(Data)) stop("must be a valid data array")
if ( !isRL ) stop("must be a valid raster")

}


R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...