CRU Data in RghcnV3

[This article was first published on Steven Mosher's Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

As many have noted CRU have posted their data a bit ago and the usual gang has started to put it through the various “engines” for calculating a global temperature index. Thanks to others who worked this problem ahead of me getting the data in Rghcnv3 was not that hard, but it was not without it’s challenges given the format. On the CRU site the format consists of a file with two types of  ”lines” in it. Header lines that give a station metadata, followed by data “lines” that contain the data for the station.  That entailed doing some ugly things to read the data in, so I tried to make the ugliness somewhat clearer.  The code:

readCruStations <- function(filename=”stations-data.txt”, output = c(“Array”,”Zoo”,”Mts”)){

if (length(output) > 1) {
warning(“Select One of either V3 or ARRAY or Mts. Using first element by default”)
returnType <- output[1]
} else {
returnType <- output
X <- readLines(“station-data.txt”)

# suck out the headers and work on them to make an inventory
dex <- which(nchar(X[]) == 72)
headers <- X[dex]

#  build an inventory that RghcnV3 likes!

Inv <- data.frame(Id = as.numeric(substr(headers,1,6)),
Lat = as.numeric(substr(headers,7,10)),
Lon = as.numeric(substr(headers,11,15)),
Altitude = as.numeric(substr(headers,16,20)),
Name = substr(headers,22,42),
Country = substr(headers,43,56),
StartYear = as.numeric(substr(headers,57,60)),
EndYear = as.numeric(substr(headers,61,64)),
Source = as.numeric(substr(headers,67,68)),
GoodFirst = as.numeric(substr(headers,69,72)))

Inv$Lat[Inv$Lat == -999] <- NA
Inv$Lon[Inv$Lon == -1999] <- NA
Inv$Lat <- Inv$Lat / 10
Inv$Lon <- Inv$Lon / 10

Inv$Altitude[Inv$Altitude == -999] <- NA

begin <- min(Inv$StartYear)
end <- max(Inv$EndYear)
Years <- begin:end
stations <- nrow(Inv)

##  build a data array that RghcnV3 likes !
DATA <- array(NA,dim=c(stations,12,length(Years)))

stationCount <- 0
for(line in 1:length(X)){
if (nchar(X[line]) == 72){
# we have a header. increment station counter ( its an array index)
stationCount <- stationCount + 1
} else {
# we have a data record.  use the regular expression  (” +”) to split the string into numbers
thisLine <- as.numeric(unlist(strsplit(X[line],” +”)))
thisLine[thisLine == -999] <-NA
thisYear <- thisLine[1] – begin + 1
DATA[stationCount,,thisYear] <- thisLine[2:13]/10

## Prepare for output
if (returnType == “Mts” | returnType == “Zoo”){
DATA <- apply(DATA,MARGIN = 1, FUN = c)
DATA <- ts(DATA, start = minYear, frequency = 12)
if (returnType == “Zoo”) DATA <- asZoo(DATA)
if (returnType == “Mts”){
return(list(Inventory = Inv,Mts = DATA))

if (returnType == “Zoo”){
return(list(Inventory = Inv,Zoo = DATA))
if (returnType == “Array”){
return(list(Inventory = Inv,Array = DATA))



Of course, then it is a simple matter to load that data into a raster and compute the area averaged global temperature:  What’s it show? Given CRU’s source data and given an algorithm that is patterned after their’s, They are cooler. Perhaps, they posted only the source data and not any adjustments they do.  These results match Nick Stokes who is also a bit warmer than CRU.

To leave a comment for the author, please follow the link and comment on their blog: Steven Mosher's Blog. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)