Working with climate data from the web in R

[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently attended ScienceOnline Climate, a conference in Washington, D.C. at AAAS. You may have heard of the ScienceOnline annual meeting in North Carolina – this was one of their topical meetings focused on Climate Change. I moderated a session on working with data from the web in R, focusing on climate data. Search Twitter for #scioClimate for tweets from the conference, and #sciordata for tweets from the session I ran. The following is an abbreviated demo of what I did in the workshop showing some of what you can do with climate data in R using our packages.

Before digging in, why would you want to get climate data programatically vs. via pushing buttons in a browser? Learning a programming language can take some time – we all already know how to use browsers. So why?! First, getting data programatically, especially in R (or Python), allows you to then easily do other stuff, like manipulate data, visualize, and analyze data. Second, if you do your work programatically, you and others can reproduce, and extend, the work you did with little extra effort. Third, programatically getting data makes tasks that are repetitive and slow, fast and easy – you can’t easily automate button clicks in a browser. Fourth, you can combine code with writing to make your entire workflow reproducible, whether it’s notes, a blog post, or even a research article.

Interactive visualizations in R

Let’s start off with something shiny. The majority of time I make static visualizations, which are great for me to look at during analyses, and for publications of research findings in PDFs. However, static visualizations don’t take advantage of the interactive nature of the web. Ramnath Vaidyanathan has developed an R package, rCharts, to generate dynamic Javascript visualizations directly from R that can be used interactively in a browser. Here is an example visualizing a dataset that comes with R.

library(devtools)
install_github("rCharts", "ramnathv")
library(rCharts)

# Load a data set
hair_eye_male <- subset(as.data.frame(HairEyeColor), Sex == "Male")

# Make a javascript plot object
n1 <- nPlot(Freq ~ Hair, group = "Eye", data = hair_eye_male, type = "multiBarChart")

# Visualize
n1$show(cdn = TRUE)

Check out the output here. If you like you can take the source code from the visualization (right click on select View Page Source) and put it in your html files, and you're good to go (as long as you have dependencies, etc.) - quicker than learning d3 and company from scratch, eh. This is a super simple example, but you can imagine the possibilities.

The data itself

First, install some packages - these are all just on Github, so you need to have devtools installed

library(devtools)
install_github("govdat", "schamberlain")
install_github("rnoaa", "ropensci")
install_github("rWBclimate", "ropensci")
install_github("rnpn", "ropensci")

Politicians talk - Sunlight Foundation listens

Look at mentions of the phrase "climate change" in congress, using the govdat package

library(govdat)
library(ggplot2)

# Get mentions of climate change from Democrats
dat_d <- sll_cw_timeseries(phrase = "climate change", party = "D")

# Add a column that says this is data from deomcrats
dat_d$party <- rep("D", nrow(dat_d))

# Get mentions of climate change from Democrats
dat_r <- sll_cw_timeseries(phrase = "climate change", party = "R")

# Add a column that says this is data from republicans
dat_r$party <- rep("R", nrow(dat_r))

# Put two tables together
dat_both <- rbind(dat_d, dat_r)

# Plot data
ggplot(dat_both, aes(day, count, colour = party)) + theme_grey(base_size = 20) + 
    geom_line() + scale_colour_manual(values = c("blue", "red"))

center

NOAA climate data, using the rnoaa package

Map sea ice for 12 years, for April only, for the North pole

library(rnoaa)
library(scales)
library(ggplot2)
library(doMC)
library(plyr)

# Get URLs for data
urls <- seaiceeurls(mo = "Apr", pole = "N")[1:12]

# Download sea ice data
registerDoMC(cores = 4)
out <- llply(urls, noaa_seaice, storepath = "~/seaicedata", .parallel = TRUE)

# Name elements of list
names(out) <- seq(1979, 1990, 1)

# Make a data.frame
df <- ldply(out)

# Plot data
ggplot(df, aes(long, lat, group = group)) + geom_polygon(fill = "steelblue") + 
    theme_ice() + facet_wrap(~.id)

center

World Bank climate data, using the rWBclimate package

Plotting annual data for different countries

Data can be extracted from countries or basins submitted as vectors. Here we will plot the expected temperature anomaly for each 20 year period over a baseline control period of 1961-2000. These countries chosen span the north to south pole. It's clear from the plot that the northern most countries (US and Canada) have the biggest anomaly, and Belize, the most equatorial country, has the smallest anomaly.

library(rWBclimate)

# Search for data
country.list <- c("CAN", "USA", "MEX", "BLZ", "ARG")
country.dat <- get_model_temp(country.list, "annualanom", 2010, 2100)

# Subset data to one specific model
country.dat.bcc <- country.dat[country.dat$gcm == "bccr_bcm2_0", ]

# Exclude A2 scenario
country.dat.bcc <- subset(country.dat.bcc, country.dat.bcc$scenario != "a2")

# Plot data
ggplot(country.dat.bcc, aes(x = fromYear, y = data, group = locator, colour = locator)) + 
    geom_point() + geom_path() + ylab("Temperature anomaly over baseline") + 
    theme_bw(base_size = 20)

center

Phenology data from the USA National Phenology Network, using rnpn

library(rnpn)

# Lookup names
temp <- lookup_names(name = "bird", type = "common")
comnames <- temp[temp$species_id %in% c(357, 359, 1108), "common_name"]

# Get some data
out <- getobsspbyday(speciesid = c(357, 359, 1108), startdate = "2010-04-01", 
    enddate = "2013-09-31")
names(out) <- comnames
df <- ldply(out)
df$date <- as.Date(df$date)

# Visualize data
library(ggplot2)
ggplot(df, aes(date, count)) + geom_line() + theme_grey(base_size = 20) + facet_grid(.id ~ 
    .)

center

Feedback and new climate data sources

Do use the above pacakges (govdat, rnoaa, rWBclimate, and rnpn) to get climate data, and get in touch with bug reports, and feature requests.

Surely there are other sources of climate data out there that you want to use in R, right? Let us know what else you want to use. Better yet, if you can sling some R code, start writing your own package to interact with a source of climate data on the web - we can lend a hand.

To leave a comment for the author, please follow the link and comment on their blog: Recology - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)