Working with climate data from the web in R

[This article was first published on Recology - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently attended ScienceOnline Climate, a conference in Washington, D.C. at AAAS. You may have heard of the ScienceOnline annual meeting in North Carolina – this was one of their topical meetings focused on Climate Change. I moderated a session on working with data from the web in R, focusing on climate data. Search Twitter for #scioClimate for tweets from the conference, and #sciordata for tweets from the session I ran. The following is an abbreviated demo of what I did in the workshop showing some of what you can do with climate data in R using our packages.

Before digging in, why would you want to get climate data programatically vs. via pushing buttons in a browser? Learning a programming language can take some time – we all already know how to use browsers. So why?! First, getting data programatically, especially in R (or Python), allows you to then easily do other stuff, like manipulate data, visualize, and analyze data. Second, if you do your work programatically, you and others can reproduce, and extend, the work you did with little extra effort. Third, programatically getting data makes tasks that are repetitive and slow, fast and easy – you can’t easily automate button clicks in a browser. Fourth, you can combine code with writing to make your entire workflow reproducible, whether it’s notes, a blog post, or even a research article.

Interactive visualizations in R

Let’s start off with something shiny. The majority of time I make static visualizations, which are great for me to look at during analyses, and for publications of research findings in PDFs. However, static visualizations don’t take advantage of the interactive nature of the web. Ramnath Vaidyanathan has developed an R package, rCharts, to generate dynamic Javascript visualizations directly from R that can be used interactively in a browser. Here is an example visualizing a dataset that comes with R.

library<span class="p">(</span>devtools<span class="p">)</span>
install_github<span class="p">(</span><span class="s">"rCharts"</span><span class="p">,</span> <span class="s">"ramnathv"</span><span class="p">)</span>
library<span class="p">(</span>rCharts<span class="p">)</span>

<span class="c1"># Load a data set</span>
hair_eye_male <span class="o"><-</span> subset<span class="p">(</span>as.data.frame<span class="p">(</span>HairEyeColor<span class="p">),</span> Sex <span class="o">==</span> <span class="s">"Male"</span><span class="p">)</span>

<span class="c1"># Make a javascript plot object</span>
n1 <span class="o"><-</span> nPlot<span class="p">(</span>Freq <span class="o">~</span> Hair<span class="p">,</span> group <span class="o">=</span> <span class="s">"Eye"</span><span class="p">,</span> data <span class="o">=</span> hair_eye_male<span class="p">,</span> type <span class="o">=</span> <span class="s">"multiBarChart"</span><span class="p">)</span>

<span class="c1"># Visualize</span>
n1<span class="o">$</span>show<span class="p">(</span>cdn <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>

Check out the output here. If you like you can take the source code from the visualization (right click on select View Page Source) and put it in your html files, and you’re good to go (as long as you have dependencies, etc.) – quicker than learning d3 and company from scratch, eh. This is a super simple example, but you can imagine the possibilities.

The data itself

First, install some packages – these are all just on Github, so you need to have devtools installed

library<span class="p">(</span>devtools<span class="p">)</span>
install_github<span class="p">(</span><span class="s">"govdat"</span><span class="p">,</span> <span class="s">"schamberlain"</span><span class="p">)</span>
install_github<span class="p">(</span><span class="s">"rnoaa"</span><span class="p">,</span> <span class="s">"ropensci"</span><span class="p">)</span>
install_github<span class="p">(</span><span class="s">"rWBclimate"</span><span class="p">,</span> <span class="s">"ropensci"</span><span class="p">)</span>
install_github<span class="p">(</span><span class="s">"rnpn"</span><span class="p">,</span> <span class="s">"ropensci"</span><span class="p">)</span>

Politicians talk – Sunlight Foundation listens

Look at mentions of the phrase “climate change” in congress, using the govdat package

library<span class="p">(</span>govdat<span class="p">)</span>
library<span class="p">(</span>ggplot2<span class="p">)</span>

<span class="c1"># Get mentions of climate change from Democrats</span>
dat_d <span class="o"><-</span> sll_cw_timeseries<span class="p">(</span>phrase <span class="o">=</span> <span class="s">"climate change"</span><span class="p">,</span> party <span class="o">=</span> <span class="s">"D"</span><span class="p">)</span>

<span class="c1"># Add a column that says this is data from deomcrats</span>
dat_d<span class="o">$</span>party <span class="o"><-</span> rep<span class="p">(</span><span class="s">"D"</span><span class="p">,</span> nrow<span class="p">(</span>dat_d<span class="p">))</span>

<span class="c1"># Get mentions of climate change from Democrats</span>
dat_r <span class="o"><-</span> sll_cw_timeseries<span class="p">(</span>phrase <span class="o">=</span> <span class="s">"climate change"</span><span class="p">,</span> party <span class="o">=</span> <span class="s">"R"</span><span class="p">)</span>

<span class="c1"># Add a column that says this is data from republicans</span>
dat_r<span class="o">$</span>party <span class="o"><-</span> rep<span class="p">(</span><span class="s">"R"</span><span class="p">,</span> nrow<span class="p">(</span>dat_r<span class="p">))</span>

<span class="c1"># Put two tables together</span>
dat_both <span class="o"><-</span> rbind<span class="p">(</span>dat_d<span class="p">,</span> dat_r<span class="p">)</span>

<span class="c1"># Plot data</span>
ggplot<span class="p">(</span>dat_both<span class="p">,</span> aes<span class="p">(</span>day<span class="p">,</span> count<span class="p">,</span> colour <span class="o">=</span> party<span class="p">))</span> <span class="o">+</span> theme_grey<span class="p">(</span>base_size <span class="o">=</span> <span class="m">20</span><span class="p">)</span> <span class="o">+</span> 
    geom_line<span class="p">()</span> <span class="o">+</span> scale_colour_manual<span class="p">(</span>values <span class="o">=</span> c<span class="p">(</span><span class="s">"blue"</span><span class="p">,</span> <span class="s">"red"</span><span class="p">))</span>

center

NOAA climate data, using the rnoaa package

Map sea ice for 12 years, for April only, for the North pole

library<span class="p">(</span>rnoaa<span class="p">)</span>
library<span class="p">(</span>scales<span class="p">)</span>
library<span class="p">(</span>ggplot2<span class="p">)</span>
library<span class="p">(</span>doMC<span class="p">)</span>
library<span class="p">(</span>plyr<span class="p">)</span>

<span class="c1"># Get URLs for data</span>
urls <span class="o"><-</span> seaiceeurls<span class="p">(</span>mo <span class="o">=</span> <span class="s">"Apr"</span><span class="p">,</span> pole <span class="o">=</span> <span class="s">"N"</span><span class="p">)[</span><span class="m">1</span><span class="o">:</span><span class="m">12</span><span class="p">]</span>

<span class="c1"># Download sea ice data</span>
registerDoMC<span class="p">(</span>cores <span class="o">=</span> <span class="m">4</span><span class="p">)</span>
out <span class="o"><-</span> llply<span class="p">(</span>urls<span class="p">,</span> noaa_seaice<span class="p">,</span> storepath <span class="o">=</span> <span class="s">"~/seaicedata"</span><span class="p">,</span> .parallel <span class="o">=</span> <span class="kc">TRUE</span><span class="p">)</span>

<span class="c1"># Name elements of list</span>
names<span class="p">(</span>out<span class="p">)</span> <span class="o"><-</span> seq<span class="p">(</span><span class="m">1979</span><span class="p">,</span> <span class="m">1990</span><span class="p">,</span> <span class="m">1</span><span class="p">)</span>

<span class="c1"># Make a data.frame</span>
df <span class="o"><-</span> ldply<span class="p">(</span>out<span class="p">)</span>

<span class="c1"># Plot data</span>
ggplot<span class="p">(</span>df<span class="p">,</span> aes<span class="p">(</span>long<span class="p">,</span> lat<span class="p">,</span> group <span class="o">=</span> group<span class="p">))</span> <span class="o">+</span> geom_polygon<span class="p">(</span>fill <span class="o">=</span> <span class="s">"steelblue"</span><span class="p">)</span> <span class="o">+</span> 
    theme_ice<span class="p">()</span> <span class="o">+</span> facet_wrap<span class="p">(</span><span class="o">~</span>.id<span class="p">)</span>

center

World Bank climate data, using the rWBclimate package

Plotting annual data for different countries

Data can be extracted from countries or basins submitted as vectors. Here we will plot the expected temperature anomaly for each 20 year period over a baseline control period of 1961-2000. These countries chosen span the north to south pole. It’s clear from the plot that the northern most countries (US and Canada) have the biggest anomaly, and Belize, the most equatorial country, has the smallest anomaly.

library<span class="p">(</span>rWBclimate<span class="p">)</span>

<span class="c1"># Search for data</span>
country.list <span class="o"><-</span> c<span class="p">(</span><span class="s">"CAN"</span><span class="p">,</span> <span class="s">"USA"</span><span class="p">,</span> <span class="s">"MEX"</span><span class="p">,</span> <span class="s">"BLZ"</span><span class="p">,</span> <span class="s">"ARG"</span><span class="p">)</span>
country.dat <span class="o"><-</span> get_model_temp<span class="p">(</span>country.list<span class="p">,</span> <span class="s">"annualanom"</span><span class="p">,</span> <span class="m">2010</span><span class="p">,</span> <span class="m">2100</span><span class="p">)</span>

<span class="c1"># Subset data to one specific model</span>
country.dat.bcc <span class="o"><-</span> country.dat<span class="p">[</span>country.dat<span class="o">$</span>gcm <span class="o">==</span> <span class="s">"bccr_bcm2_0"</span><span class="p">,</span> <span class="p">]</span>

<span class="c1"># Exclude A2 scenario</span>
country.dat.bcc <span class="o"><-</span> subset<span class="p">(</span>country.dat.bcc<span class="p">,</span> country.dat.bcc<span class="o">$</span>scenario <span class="o">!=</span> <span class="s">"a2"</span><span class="p">)</span>

<span class="c1"># Plot data</span>
ggplot<span class="p">(</span>country.dat.bcc<span class="p">,</span> aes<span class="p">(</span>x <span class="o">=</span> fromYear<span class="p">,</span> y <span class="o">=</span> data<span class="p">,</span> group <span class="o">=</span> locator<span class="p">,</span> colour <span class="o">=</span> locator<span class="p">))</span> <span class="o">+</span> 
    geom_point<span class="p">()</span> <span class="o">+</span> geom_path<span class="p">()</span> <span class="o">+</span> ylab<span class="p">(</span><span class="s">"Temperature anomaly over baseline"</span><span class="p">)</span> <span class="o">+</span> 
    theme_bw<span class="p">(</span>base_size <span class="o">=</span> <span class="m">20</span><span class="p">)</span>

center

Phenology data from the USA National Phenology Network, using rnpn

library<span class="p">(</span>rnpn<span class="p">)</span>

<span class="c1"># Lookup names</span>
temp <span class="o"><-</span> lookup_names<span class="p">(</span>name <span class="o">=</span> <span class="s">"bird"</span><span class="p">,</span> type <span class="o">=</span> <span class="s">"common"</span><span class="p">)</span>
comnames <span class="o"><-</span> temp<span class="p">[</span>temp<span class="o">$</span>species_id <span class="o">%in%</span> c<span class="p">(</span><span class="m">357</span><span class="p">,</span> <span class="m">359</span><span class="p">,</span> <span class="m">1108</span><span class="p">),</span> <span class="s">"common_name"</span><span class="p">]</span>

<span class="c1"># Get some data</span>
out <span class="o"><-</span> getobsspbyday<span class="p">(</span>speciesid <span class="o">=</span> c<span class="p">(</span><span class="m">357</span><span class="p">,</span> <span class="m">359</span><span class="p">,</span> <span class="m">1108</span><span class="p">),</span> startdate <span class="o">=</span> <span class="s">"2010-04-01"</span><span class="p">,</span> 
    enddate <span class="o">=</span> <span class="s">"2013-09-31"</span><span class="p">)</span>
names<span class="p">(</span>out<span class="p">)</span> <span class="o"><-</span> comnames
df <span class="o"><-</span> ldply<span class="p">(</span>out<span class="p">)</span>
df<span class="o">$</span>date <span class="o"><-</span> as.Date<span class="p">(</span>df<span class="o">$</span>date<span class="p">)</span>

<span class="c1"># Visualize data</span>
library<span class="p">(</span>ggplot2<span class="p">)</span>
ggplot<span class="p">(</span>df<span class="p">,</span> aes<span class="p">(</span>date<span class="p">,</span> count<span class="p">))</span> <span class="o">+</span> geom_line<span class="p">()</span> <span class="o">+</span> theme_grey<span class="p">(</span>base_size <span class="o">=</span> <span class="m">20</span><span class="p">)</span> <span class="o">+</span> facet_grid<span class="p">(</span>.id <span class="o">~</span> 
    .<span class="p">)</span>

center

Feedback and new climate data sources

Do use the above pacakges (govdat, rnoaa, rWBclimate, and rnpn) to get climate data, and get in touch with bug reports, and feature requests.

Surely there are other sources of climate data out there that you want to use in R, right? Let us know what else you want to use. Better yet, if you can sling some R code, start writing your own package to interact with a source of climate data on the web – we can lend a hand.

To leave a comment for the author, please follow the link and comment on their blog: Recology - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)