BerkeleyEarth Version 1.6

[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Version 1.6 has just been submitted to CRAN and I will post the source here in the dropbox. Version 1.6 will probably be stable for a long time unless I find a bug. The main additions I made were adding a few functions to make some things easier, and I converted two files–flags and sources– to file.backed.big.matrixes.  Thus the read() routines for those files have changed the input parameters. The flags.txt and the sources.txt files are so large that they cause even my 4GB windows system to choke and slow down. The functions now work to create a  *bin file when they are first called. After the first call, access to them is immediate. In addition I added a function for creating *bin files for all versions of data.txt you may have on your system. makeBinFiles()  is called automagically after you download all the files using downloadBerkeley().  Or, you can call it separately from your main working directory. Converting the files takes about 10 minutes per file but it worth the time to do it once. After that, the file is attached instantly.

With 1.6 installed let’s do a short example to illustrate something about the Berkeley datasets. I’ve installed the package 1.6 and created a working directory called BestDownloadTest. I make that my working directory and I run downloadBerkeley(). After that function completes I have a workspace that looks like this:


I’ll run getFileInformation() on the TAVG directory. That function will create separate readmes for the all the files and collect some information. Next,  we can select a directory to work with, on windows I just use  choose.dir() and point at a directory like the “Single-Value” folder.  I can do this like so Data <- readBerkeleyData(Directory=choose.dir()) and that will instantly attach  ”data.bin” to the variable Data.

Then we can get some simple statistics on the file  length(unique(Data[,”Id”]))  gives us the number of unique station Ids: 36853. The range of dates min(Data[,”Date”]) 1701.042 and the then the max(Data[,”Date”]) 2011.875.  And then I can do histograms of the dates and the temperatures




To leave a comment for the author, please follow the link and comment on their blog: Steven Mosher's Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)