NASA GISS’s Annual Global Temperature Anomaly Trends (dplyr/ggplot version)

[This article was first published on » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

D Kelly O’Day did a great post on charting NASA’s Goddard Institute for Space Studies (GISS) temperature anomaly data, but it sticks with base R for data munging & plotting. While there’s absolutely nothing wrong with base R operations, I thought a modern take on the chart using dplyr, magrittr & tidyr for data manipulation and ggplot2 for formatting would be helpful for the scores of new folk learning R this year (our little language is becoming all the rage, it seems). I also really enjoy working with weather data.

Before further exposition, here’s the result:


I made liberal use of the “piping” idiom encouraged magrittr, dplyr and other new R packages, including the forward assignment operator -> (which may put some folks off a bit). That also meant using magrittr‘s aliases for [ and [[, which are more readable in pipes.

I don’t use library(tidyr) since tidyr‘s extract conflicts with magrittr‘s, but you’ll see a tidyr::gather in the code for wide-to-long data shaping.

I chose to use the monthly temperature anomaly data as a base layer in the chart as a contrast to the monthly- and annual-anomaly means. I also marked the hottest annual- and annual-mean anomalies and framed the decades with vertical markers.

There are no hardcoded years or decades anywhere in the ggplot2 code, so this should be quite reusable as the data source gets updated.

As I come back to the chart, I think there may be a bit too much “chart junk” on it, but you can tweak it to your own aesthetic preferences (if you do, drop a note in the comments with a link to your creation).

The code is below and in this gist.

# data retrieval ----------------------------------------------------------
# the user agent string was necessary for me; YMMV
pg <- GET("",
          user_agent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.75.14 (KHTML, like Gecko) Version/7.0.3 Safari/7046A194A"))
# extract monthly data ----------------------------------------------------
content(pg, as="text") %>%
  strsplit("n") %>%
  extract2(1) %>%
  grep("^[[:digit:]]", ., value=TRUE) -> lines
# extract column names ----------------------------------------------------
content(pg, as="text") %>%
  strsplit("n") %>%
  extract2(1) %>%
  extract(8) %>%
  strsplit(" +") %>%
  extract2(1) -> lines_colnames
# make data frame ---------------------------------------------------------
data <- read.table(text=lines, stringsAsFactors=FALSE)
colnames(data) <- lines_colnames
# transform data frame ----------------------------------------------------
data %>%
  tidyr::gather(month, value, Jan, Feb, Mar, Apr, May, Jun,
                       Jul, Aug, Sep, Oct, Nov, Dec) %>%     # wide to long
  mutate(value=value/100) %>%                                # convert to degree Celcius change
  select(year=Year, month, value) %>%                        # only need these fields
  mutate(date=as.Date(sprintf("%d-%d-%d", year, month, 1)),  # make proper dates
         decade=year %/% 10,                                 # calc decade
         start=decade*10, end=decade*10+9) %>%               # calc decade start/end
  group_by(decade) %>%
    mutate(decade_mean=mean(value)) %>%                      # calc decade mean
  group_by(year) %>%
    mutate(annum_mean=mean(value)) %>%                       # calc annual mean
  ungroup -> data
# start plot --------------------------------------------------------------
gg <- ggplot()
# decade vertical markers -------------------------------------------------
gg <- gg + geom_vline(data=data %>% select(end),
                      aes(xintercept=as.numeric(as.Date(sprintf("%d-12-31", end)))),
                          size=0.5, color="#4575b4", linetype="dotted", alpha=0.5)
# monthly data ------------------------------------------------------------
gg <- gg + geom_line(data=data, aes(x=date, y=value, color="monthly anomaly"),
                     size=0.35, alpha=0.25)
gg <- gg + geom_point(data=data, aes(x=date, y=value, color"monthly anomaly"),
                      size=0.75, alpha=0.5)
# decade mean -------------------------------------------------------------
gg <- gg + geom_segment(data=data %>% distinct(decade, decade_mean, start, end),
                        aes(x=as.Date(sprintf("%d-01-01", start)),
                            xend=as.Date(sprintf("%d-12-31", end)),
                            y=decade_mean, yend=decade_mean,
                            color="decade mean anomaly"),
# annual data -------------------------------------------------------------
gg <- gg + geom_line(data=data %>% distinct(year, annum_mean),
                      aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean,
                          color="annual mean anomaly"),
gg <- gg + geom_point(data=data %>% distinct(year, annum_mean),
                      aes(x=as.Date(sprintf("%d-06-15", year)), y=annum_mean,
                          color="annual mean anomaly"),
# additional annotations --------------------------------------------------
# max annual mean anomaly horizontal marker/text
gg <- gg + geom_hline(yintercept=max(data$annum_mean),  alpha=0.9,
                      color="#d73027", linetype="dashed", size=0.25)
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", mean(range(data$year)))),
                    color="#d73027", alpha=0.9,
                    hjust=0.25, vjust=-1, size=3,
                    label=sprintf("Max annual mean anomaly %2.1fºC", max(data$annum_mean)))
gg <- gg + geom_hline(yintercept=max(data$value),  alpha=0.9,
                      color="#7f7f7f", linetype="dashed", size=0.25)
# max annual anomaly horizontal marker/text
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", mean(range(data$year)))),
                    color="#7f7f7f",  alpha=0.9,
                    hjust=0.25, vjust=-1, size=3,
                    label=sprintf("Max annual anomaly %2.1fºC", max(data$value)))
gg <- gg + annotate("text",
                    x=as.Date(sprintf("%d-12-31", range(data$year)[2])),
                    y=min(data$value), size=3, hjust=1,
# set colors --------------------------------------------------------------
gg <- gg + scale_color_manual(name="", values=c("#d73027", "#4575b4", "#7f7f7f"))
# set x axis limits -------------------------------------------------------
gg <- gg + scale_x_date(expand=c(0, 1),
                        limits=c(as.Date(sprintf("%d-01-01", range(data$year)[1])),
                                 as.Date(sprintf("%d-12-31", range(data$year)[2]))))
# add labels --------------------------------------------------------------
gg <- gg + labs(x=NULL, y="GLOBAL Temp Anomalies in 1.0ºC",
                title=sprintf("GISS Land and Sea Temperature Annual Anomaly Trend (%d to %d)n",
                              range(data$year)[1], range(data$year)[2]))
# theme/legend tweaks -----------------------------------------------------
gg <- gg + theme_bw()
gg <- gg + theme(panel.grid=element_blank())
gg <- gg + theme(panel.border=element_blank())
gg <- gg + theme(legend.position=c(0.9, 0.2))
gg <- gg + theme(legend.key=element_blank())
gg <- gg + theme(legend.background=element_blank())

To leave a comment for the author, please follow the link and comment on their blog: » R. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)