BerkeleyEarth Version 1.6

March 5, 2012
By

(This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers)

Version 1.6 has just been submitted to CRAN and I will post the source here in the dropbox. Version 1.6 will probably be stable for a long time unless I find a bug. The main additions I made were adding a few functions to make some things easier, and I converted two files–flags and sources– to file.backed.big.matrixes.  Thus the read() routines for those files have changed the input parameters. The flags.txt and the sources.txt files are so large that they cause even my 4GB windows system to choke and slow down. The functions now work to create a  *bin file when they are first called. After the first call, access to them is immediate. In addition I added a function for creating *bin files for all versions of data.txt you may have on your system. makeBinFiles()  is called automagically after you download all the files using downloadBerkeley().  Or, you can call it separately from your main working directory. Converting the files takes about 10 minutes per file but it worth the time to do it once. After that, the file is attached instantly.

With 1.6 installed let’s do a short example to illustrate something about the Berkeley datasets. I’ve installed the package 1.6 and created a working directory called BestDownloadTest. I make that my working directory and I run downloadBerkeley(). After that function completes I have a workspace that looks like this:

 

I’ll run getFileInformation() on the TAVG directory. That function will create separate readmes for the all the files and collect some information. Next,  we can select a directory to work with, on windows I just use  choose.dir() and point at a directory like the “Single-Value” folder.  I can do this like so Data <- readBerkeleyData(Directory=choose.dir()) and that will instantly attach  ”data.bin” to the variable Data.

Then we can get some simple statistics on the file  length(unique(Data[,"Id"]))  gives us the number of unique station Ids: 36853. The range of dates min(Data[,"Date"]) 1701.042 and the then the max(Data[,"Date"]) 2011.875.  And then I can do histograms of the dates and the temperatures

 

 

 


To leave a comment for the author, please follow the link and comment on his blog: Steven Mosher's Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.