[This article was first published on R on Dominic Royé, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When we visualize precipitation and temperature anomalies, we simply use time series as bar graph indicating negative and positive values in red and blue. However, in order to have a better overview we need both anomalies in a single graph. In this way we could more easly answer the question of whether a particular season or month was dry-warm or wet-cold, and even compare these anomalies in the context of previous years.

## Packages

In this post we will use the following packages:

Package Description
tidyverse Collection of packages (visualization, manipulation): ggplot2, dplyr, purrr, etc.
lubridate Easy manipulation of dates and times
ggrepel Repel overlapping text labels in ggplot2
#we install the packages if necessary
if(!require("tidyverse")) install.packages("tidyverse")
if(!require("ggrepel")) install.packages("ggrepel")
if(!require("lubridate")) install.packages("lubridate")

#packages
library(tidyverse)
library(lubridate)
library(ggrepel)

## Preparing the data

First we import the daily precipitation and temperature data from the selected weather station (download). We will use the data from Tenerife South (Spain) [1981-2020] accessible through Open Data AEMET. In R there is a package called meteoland that facilitates the download with specific functions to access data from AEMET (Spanish State Meteorological Agency), Meteogalicia (Galician Meteorological Service) and Meteocat (Catalan Meteorological Service).

### Step 1: import the data

We import the data in csv format, the first column is the date, the second column the precipitation (pr) and the last column the average daily temperature (ta).

data <- read_csv("meteo_tenerife.csv")
## Parsed with column specification:
## cols(
##   date = col_date(format = ""),
##   pr = col_double(),
##   ta = col_double()
## )
data
## # A tibble: 14,303 x 3
##    date          pr    ta
##    <date>     <dbl> <dbl>
##  1 1981-01-02   0    17.6
##  2 1981-01-03   0    16.8
##  3 1981-01-04   0    17.4
##  4 1981-01-05   0    17.6
##  5 1981-01-06   0    17
##  6 1981-01-07   0    17.6
##  7 1981-01-08   0    18.6
##  8 1981-01-09   0    19.8
##  9 1981-01-10   0    21.5
## 10 1981-01-11   3.8  17.6
## # ... with 14,293 more rows

### Step 2: preparing the data

In the second step we prepare the data to calculate the anomalies. To do this, we create three new columns: the month, the year, and the season of the year. Since our objective is to analyse winter anomalies, we cannot use the calendar year, since winter includes the month of December of one year and the months of January and February of the following. The custom function meteo_yr() extracts the year from a date indicating the starting month. The concept is similar to the hydrological year in which it starts on October 1.

meteo_yr <- function(dates, start_month = NULL) {
# convert to POSIXlt
dates.posix <- as.POSIXlt(dates)
# year offset
offset <- ifelse(dates.posix$mon >= start_month - 1, 1, 0) # new year adj.year = dates.posix$year + 1900 + offset
}

We will use many functions of the package collection tidyverse (https://www.tidyverse.org/). The mutate() function helps to add new columns or change existing ones. To define the seasons, we use the case_when() function from the dplyr package, which has many advantages compared to a chain of ifelse(). In case_when() we use two-side formulas, on the one hand the condition and on the other the action when that condition is met. A two-sided formula in R consists of the operator ~. The binary operator %in% allows us to filter several values in a greater set.

data <- mutate(data,
winter_yr = meteo_yr(date, 12),
month = month(date),
season = case_when(month %in% c(12,1:2) ~ "Winter",
month %in% 3:5 ~ "Spring",
month %in% 6:8 ~ "Summer",
month %in% 9:11 ~ "Autum"))

data
## # A tibble: 14,303 x 6
##    date          pr    ta winter_yr month season
##    <date>     <dbl> <dbl>     <dbl> <dbl> <chr>
##  1 1981-01-02   0    17.6      1981     1 Winter
##  2 1981-01-03   0    16.8      1981     1 Winter
##  3 1981-01-04   0    17.4      1981     1 Winter
##  4 1981-01-05   0    17.6      1981     1 Winter
##  5 1981-01-06   0    17        1981     1 Winter
##  6 1981-01-07   0    17.6      1981     1 Winter
##  7 1981-01-08   0    18.6      1981     1 Winter
##  8 1981-01-09   0    19.8      1981     1 Winter
##  9 1981-01-10   0    21.5      1981     1 Winter
## 10 1981-01-11   3.8  17.6      1981     1 Winter
## # ... with 14,293 more rows

### Step 3: estimate winter anomalies

In the next step we create a subset of the winter months. Then we group by the defined meteorological year and calculate the sum and average for precipitation and temperature, respectively. To facilitate the work, the magrittr package introduces the operator called pipe in the form %>% with the aim of combining several functions without the need to assign the result to a new object. The pipe operator passes the output of a function applied to the first argument of the next function. This way of combining functions allows you to chain several steps simultaneously. The %>% must be understood and pronounced as then.

data_inv <- filter(data,
season == "Winter") %>%
group_by(winter_yr) %>%
summarise(pr = sum(pr, na.rm = TRUE),
ta = mean(ta, na.rm = TRUE))

Now we only have to calculate the anomalies of precipitation and temperature. The columns pr_mean and ta_mean will contain the climate average, the reference for the anomalies with respect to the normal period 1981-2010. Therefore, we need to filter the values to the period before 2010, which we will do in the usual way of filtering vectors in R. Once we have the references we estimate the anomalies pr_anom and ta_anom. To facilitate the interpretation, in the case of precipitation we express the anomalies as percentage, with the average set at 0% instead of 100%.

In addition, we add three required columns with information for the creation of the graph: 1) labyr contains the year of each anomaly as long as it has been greater/less than -+10% or -+0.5ºC, respectively (this is for reducing the number of labels), 2) symb_point is a dummy variable in order to be able to create diferent symbols between the cases of (1), and 3) lab_font for highlighting in bold the year 2020.

data_inv <-  mutate(data_inv, pr_mean = mean(pr[winter_yr <= 2010]),
ta_mean = mean(ta[winter_yr <= 2010]),
pr_anom = (pr*100/pr_mean)-100,
ta_anom = ta-ta_mean,

labyr = case_when(pr_anom < -10 & ta_anom < -.5 ~ winter_yr,
pr_anom < -10 & ta_anom > .5 ~ winter_yr,
pr_anom > 10 & ta_anom < -.5 ~ winter_yr,
pr_anom > 10 & ta_anom > .5 ~ winter_yr),
symb_point = ifelse(!is.na(labyr), "yes", "no"),
lab_font = ifelse(labyr == 2020, "bold", "plain")
)

## Creating the graph

We will build the chart adding layer by layer the distinctive elements: 1) the background with the different grids (Dry-Warm, Dry-Cold, etc.), 2) the points and labels, and 3) the style adjustments.

### Part 1

The idea is that the points with dry-warm anomalies are located in quadrant I (top-right) and those with wet-cold in quadrant III (bottom-left). Therefore, we must invert the sign in the precipitation anomalies. Then we create a data.frame with the label positions of the four quadrants. For the positions in x and y Inf and -Inf are used, which is equivalent to the maximum panel sides with respect to the data. However, it is necessary to adjust the position towards the extreme points within the panel with the known arguments of ggplot2: hjust and vjust.

data_inv_p <- mutate(data_inv, pr_anom = pr_anom * -1)

bglab <- data.frame(x = c(-Inf, Inf, -Inf, Inf),
y = c(Inf, Inf, -Inf, -Inf),
hjust = c(1, 1, 0, 0),
vjust = c(1, 0, 1, 0),
lab = c("Wet-Warm", "Dry-Warm",
"Wet-Cold", "Dry-Cold"))

bglab
##      x    y hjust vjust      lab
## 1 -Inf  Inf     1     1 Wet-Warm
## 2  Inf  Inf     1     0 Dry-Warm
## 3 -Inf -Inf     0     1 Wet-Cold
## 4  Inf -Inf     0     0 Dry-Cold

### Part 2

In the second part we can start building the chart by adding all graphical elements. First we create the background with different colors of each quadrant. The function annotate() allows adding geometry layers without the use of variables within data.frames. With the geom_hline() and geom_vline() function we mark the quadrants horizontally and vertically using a dashed line. Finally, we draw the labels of each quadrant, using the function geom_text(). When we use other data sources than the main one used in ggplot(), we must indicate it with the argument data in the corresponding geometry function.

g1 <- ggplot(data_inv_p,
aes(pr_anom, ta_anom)) +
annotate("rect", xmin = -Inf, xmax = 0, ymin = 0, ymax = Inf, fill = "#fc9272", alpha = .6) + #wet-warm
annotate("rect", xmin = 0, xmax = Inf, ymin = 0, ymax = Inf, fill = "#cb181d", alpha = .6) + #dry-warm
annotate("rect", xmin = -Inf, xmax = 0, ymin = -Inf, ymax = 0, fill = "#2171b5", alpha = .6) + #wet-cold
annotate("rect", xmin = 0, xmax = Inf, ymin = -Inf, ymax = 0, fill = "#c6dbef", alpha = .6) + #dry-cold
geom_hline(yintercept = 0,
linetype = "dashed") +
geom_vline(xintercept = 0,
linetype = "dashed") +
geom_text(data = bglab,
aes(x, y, label = lab, hjust = hjust, vjust = vjust),
fontface = "italic", size = 5,
angle = 90, colour = "white")

g1

### Part 3

In the third part we simply add the points of the anomalies and the labels of the years. The geom_text_repel() function is similar to the one known by default in ggplot2, geom_text(), but it repels overlapping text labels away from each other.

g2 <- g1 + geom_point(aes(fill = symb_point, colour = symb_point),
size = 2.8, shape = 21, show.legend = FALSE) +
geom_text_repel(aes(label = labyr, fontface = lab_font),
max.iter = 5000,
size = 3.5)
g2
## Warning: Removed 25 rows containing missing values (geom_text_repel).

### Part 4

In the last part we adjust, in addition to the general style, the axes, the color type and the (sub)title. Remember that we changed the sign on precipitation anomalies. Hence, we must use the arguments breaks and labels in the function scale_x_continouous() to reverse the sign in the labels corresponding to the breaks.

g3 <- g2 + scale_x_continuous("Precipitation anomaly in %",
breaks = seq(-100, 250, 10) * -1,
labels = seq(-100, 250, 10),
limits = c(min(data_inv_p\$pr_anom), 100)) +
scale_y_continuous("Mean temperature anomaly in ºC",
breaks = seq(-2, 2, 0.5)) +
scale_fill_manual(values = c("black", "white")) +
scale_colour_manual(values = rev(c("black", "white"))) +
labs(title = "Winter anomalies in Tenerife South",
caption = "Data: AEMET\nNormal period 1981-2010") +
theme_bw()

g3
## Warning: Removed 25 rows containing missing values (geom_text_repel).

To leave a comment for the author, please follow the link and comment on their blog: R on Dominic Royé.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)