118 years of US State Weather Data

[This article was first published on Drunks&Lampposts » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A recent post on the Junkcharts blog looked at US weather dataand the importance of explaining scales (which in this case went up to 118). Ultimately, it turns out that 118 is the rank of the data compared to the previous 117 years of data (in ascending order, so that 118 is the highest). At the end of the post,

I always like to explore doing away with the unofficial rule that says spatial data must be plotted on maps. Conceptually I’d like to see the following heatmap, where a concentration of red cells at the top of the chart would indicate extraordinarily hot temperatures across the states. I couldn’t make this chart because the NOAA website has this insane interface where I can only grab the rank for one state for one year one at a time. But you get the gist of the concept.

In this spirit then, I wrote a little R script for scraping the data and produced a couple of charts based on it (click on them to get full size versions). I used Charles Web Proxy to figure out what needs to be sent to the website to return the data I was looking for. A Heatmap for March 2012, which shows the rank for each state in the latest month: A Heatmap for each March going back to 1895: The code to reproduce and tweak these charts is below:


### Packages needed for the work
library(RCurl)
library(ggplot2)

### Get list of US states to tie onto dataset, remove Alaska and Hawaii
us.list.of.states <- readHTMLTable("http://www.worldatlas.com/aatlas/populations/usapoptable.htm")[[1]]
us.list.of.states <- us.list.of.states[ c(-2, -11), ]

### Functions to pull monthly and annual data from the NOAA website
getNOAAdataMonth <- function(state.no, month){
	
	zeroes = ifelse(state.no > 9, "0", "00")
	state.string = paste(zeroes, state.no, sep="")
	
	data.in <- postForm("http://climvis.ncdc.noaa.gov/cgi-bin/cag3/hr-display3.pl",
			data_set = "01",
			byear = "1895",
			period = month,
			lyear = "2012",
			strgn = state.string, 
			bbeg = "1901",
			bend = "2000",
			trend = "0",
			type = "3",
			rank = "0",
			send.x = "60",
			send.y = "8", 
			spec = "")
	
	data.out <- readHTMLTable(data.in)[[2]]
	data.out$state <- us.list.of.states[state.no, 3]
	data.out}

getNOAAdataAnnual <- function(state.no){

zeroes = ifelse(state.no > 9, "0", "00")
state.string = paste(zeroes, state.no, sep="")

data.in <- postForm("http://climvis.ncdc.noaa.gov/cgi-bin/cag3/hr-display3.pl",
		data_set = "01",
		byear = "1895",
		period = "17",
		lyear = "2012",
		strgn = state.string, 
		bbeg = "1901",
		bend = "2000",
		trend = "0",
		type = "3",
		rank = "0",
		send.x = "60",
		send.y = "8", 
		spec = "")

data.out <- readHTMLTable(data.in)[[2]]
data.out$state <- us.list.of.states[state.no, 3]
data.out}

### Run function over 48 states
weather.data.annual <- sapply(1:48, function(x) getNOAAdataAnnual(x), simplify=FALSE)
weather.data.march <- sapply(1:48, function(x) getNOAAdataMonth(x, "3"), simplify=FALSE)

### Join lists together into dataframe
weather.data.2 <- do.call("rbind", weather.data.march)
weather.data.annual.2 <- do.call("rbind", weather.data.annual)

### rename columns for easier use
colnames(weather.data.2) <- c("year", "temp", "rank1", "rank2", "state")

### Subset 2012 data for first chart
weather.data.march2012 <- subset(weather.data.2, year==2012)
weather.data.march2012$fill <- ifelse(as.numeric(as.character(weather.data.march2012$rank1))==118, 1, 0)

ggplot(weather.data.march2012, aes(x=state, y=as.numeric(as.character(rank1)), fill=1, label = state))+
		geom_tile()+
		geom_text(size=3)+
		ylab("March 2012 Rank")+
		scale_fill_continuous("", low="white", high="red")+
		opts(title = "All the red at the top means record temperatures across many states")

### Plot all years data (year is a factor in the dataset, so need to convert to numeric)
ggplot(weather.data.2, aes(x=state, y=as.numeric(as.character(year)), fill=as.numeric(as.character(rank1))))+
		geom_tile()+
		coord_flip()+
		scale_fill_continuous("", low="white", high="dark red")+
		ylab("Year")+
		opts(title = "All the red at the right means record temperatures across many states")



To leave a comment for the author, please follow the link and comment on their blog: Drunks&Lampposts » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)