I’ve just finished and uploaded another climate data package for R. This one focuses on CRN the climate Reference Network
Here is their home page
The package for now is really simple, but all of the packages I’m building are getting simpler. In the end ( whenever that is ) I think I’ll end up with a host of packages that manage the downloading of data and the formating of it into “analysis friendly” formats. The CRN posed an interesting challenge. They have hourly data and over 30 measurands. Going forward I’m seeing a collection of packages that looks like this: a series of packages that takes online climate data and reformats it into some standard formats that a few of us have been converging on. In an OOP design they will become the core objects. Then we have a series of functions for doing basic spatial and time series stats, and we have our spatial tools and Time series tools. Here is the package line up as of today
1. RghcnV3: data formation and analysis
2. CHCN ( enviroment canada data )
3. Ghcndaily: ghcn daily data
4. crn : climate reference network
Over time the goal will be to refactor RghcnV3 and strip out the analysis part of it into a separate package. All the data formating packages thus would have a common set of formats and objects and then analysis code would be written as methods on that.. eh well thats the dream
So here is what you can do with the crn package today. There are 3 core functions: downloadCRN, collate*, and writeDataset(). The download function does all the heavy lifting to download both daily and hourly data from CRN. Data starts in the year 2000 and extend to today. The downloadCRN function uses RCurl to get the directory listings from the ftp and create the download lists. Then the process of downloading the 1000 + files starts.
The function lets you control whether you want daily files or hourly. I just get both. the data comes in station files. One file for every station for every year. The next step we take is to collate these files into one monolithic file. One file that contains the data for all the stations. For daily data we use collateDaily() and for hourly data we use collateHourly(). These functions have two side effects. They write a consolidated datafile and a metadata file that records station names, lat/lon and Id number. In the case of the hourly data this file is quite large over 1GB. Moreover the file contains all the variables: T min, tmax, solar radiation, soil temperatures. The last function turns these monolithic files into what we are used to. Files with one variable for all stations. That function is writeDataset(). The function operates on either hourly or daily files and collects a single variable such as T_MEAN. the function is defined like so
writeDataset(filename, cnames = colnamesDaily, varname = “T_DAILY_MEAN”)
The first variable filename is supplied to point to either the monolithic daily data or hourly data. the next variable “cnames” points to the column names of the dataset. These are predefined as constants for the package. the last variable “varname” is the variable you want to collect. When you run this function the side effect is a file is written contain all the T_DAILY_MEAN data for every station. Effectively the package allows you to download the CRN data and build subsets of data from the huge collection. There are over 30 climate variables available, so I constructed the tool so that you can build datasets from the source data.
Version 1.0 is posted to CRAN, should be up shortly. In the next installment I’ll probably add support for “zoo” objects and function to create daily from hourly and monthly from daily. At that point it will be fully integrated with the RghcnV3 data structures