Thoughts on Teaching R and Yet Another Tidyverse Intro

Posted on March 16, 2018 by R Bloggers on syknapptic in R bloggers | 0 Comments

[This article was first published on R Bloggers on syknapptic, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Image credit to R Memes for Statistical Fiends

Considering this is a blog post, I’m going to get all bloggy here before jumping into the code.

Context

I recently had the opportunity to teach some R coding to colleagues and classmates in a series of workshops. Some had already dabbled in R or other programming languages, but it was the first time that the majority of participants had written a single line of code.

A few things happened in the week following the last session that I didn’t expect.

First, I saw a bit of R code written on a campus whiteboard that had nothing to do with me, but was straight out of the workshop. It may have come from some of my data-centric colleagues who use R, so I didn’t think too much of it.

Then, I overheard a conversation involving R from those in a program that doesn’t require any data-related coursework. Many folks are familiar with the name as the school’s primary data analysis course uses the {Rcmdr} GUI for statistical analysis, but these students would not have necessarily taken the course. I wondered if there was a connection.

Finally, a student who didn’t even attend came to my office hours asking for resources. Why exactly? Some of his work colleagues attended. It turns out that they are now trying to incorporate some R-powered analysis in their work and he doesn’t want to miss out.

The workshop consisted of 3 consecutive Fridays lasting 90 minutes each. That’s only a total of 4.5 hours.

That’s relatively tiny amount of time.

Wait. Scratch that.

That’s a negligible amount of time.

… but it was enough to convince some participants and non-participants that they should take advantage of the power that a bit of data-centric coding can offer.

Reflection

I taught a similar 90 minute workshop last spring using R, but focused on base R and a few data types. 10 minutes in, I’m trying to explain the difference between a data.frame and a matrix and the person asking the question says something along the lines of “I guess I’m kinda dumb. Don’t worry about it.”

For context, these were international policy graduate students. While some have completed a bit of quantitative coursework, most don’t have a hardcore math or science background and programming is seen as something akin to wizardry. However, they hold domain expertise in some rather important subjects. These include WMD nonproliferation, international development, economic diplomacy, conflict studies, and environmental policy. Nearly half of the participants were international students and everyone is proficient in at least a second natural language. Most have already tackled big, complicated problems in their careers and the others are on their way to doing so following graduation. In a nutshell, they’re not dumb. The way I was teaching was dumb. They knew that they’re supposed to want to learn new skills, but they didn’t know why. Focusing on the “basics” didn’t show them anything immediately useful. It didn’t show them the why.

After the workshop, I never heard anyone mention R outside my circle of fellow data folks.

Since that time, I started using R more. Like, a lot more. I have found a way to use R in nearly everything I’ve done since May 2017. As a policy student myself, that has not always been very straightforward and I was still avoiding the strange “tidy” code I’d encounter on Stack Overflow and elsewhere. I realized the error of my ways when I came across Julia Silge and David Robinson’s Text Mining with R. It was like discovering that you’re still in the stone age while most people are off partying on spaceships.

In preparation for this workshop series, I found a lot of inspiration in Michael Levy’s presentation on teaching R, which itself echoes principles preached by other tidyverse advocates.

A huge takeaway: live coding works.

Writing code in real time shows every single step we make from opening the IDE, to reshaping the data, to debugging inevitable errors, to rendering a final report.

Within a few short weeks of learning to code, it might be surprising how many tiny steps become automatic and taken for granted. Tack on a couple more months and newcomers will think you’re speaking in an entirely different language because you’re explaining something requiring context they simply haven’t yet encountered. Add a few years and… yeesh.

Something that frustrated me when I first started is that code explanations often seem to be written in such a way that dismisses how difficult establishing the basics can be. I’m half-convinced that, for some folks, the trauma was so great that they have simply blocked it from memory. Code is intimidating enough, but if an instructor doesn’t make a conscious effort to empathize, students will question their ability to learn. The goal is empowerment, not intimidation.

Live coding enforces a maximum speed in moving through exercises, which not only gives students more time to digest what you’re doing. It also provides more opportunities for them to ask questions on details you might find trivial, but only because you already suffered through them.

I also think that the benefits of live coding extend to the instructor as well. I found myself answering questions that framed things in ways that I had not even considered, but were exactly how multiple participants saw the task. Additionally, I have a better sense of which concepts need to be covered in more detail, as they weren’t necessarily as intuitive for others as they were for me. On the flip-side, concepts with which I remember struggling may not be difficult at all for others to understand.

… and now that we got the bloggyness of a blog post out of the way…

Here is the workflow I used for the first session. The goal was to introduce the primary {dplyr} verbs, functions that accomplish tasks necessary in nearly every project. Between each section is an exercise using {ggplot}.

tidyverse::tidyverse_logo()
## * __  _    __   .    o           *  . 
##  / /_(_)__/ /_ ___  _____ _______ ___ 
## / __/ / _  / // / |/ / -_) __(_-</ -_)
## \__/_/\_,_/\_, /|___/\__/_/ /___/\__/ 
##      *  . /___/      o      .       *

# install.packages("tidyverse")
library(tidyverse)

# install.packages("gapminder")
library(gapminder)
# loads the gapminder data set

## just to prettify printed tables when knitting
# install.packages("kableExtra")
library(knitr)
library(kableExtra)

Workflow

Resources Up Front

Data Carpentry

Cheat Sheet

Plotting

Cheat Sheet

R Graph Catalog

Our Data

In the following exercises, gm.data.frame will be used to demonstrate actions that use {base} R methods for data.frame operations while gm_df will be used to to demonstrate {tidyverse} methods for tibble operations.

gm.data.frame <- as.data.frame(gapminder)

gm_df <- gapminder

`tibble`

class(gm.data.frame)
## [1] "data.frame"
class(gm_df)
## [1] "tbl_df"     "tbl"        "data.frame"

tibbles are opinionated data.frames that keep everything that is helpful about data.frames, changes some of their quirks, and adds methods that makes them even more useful.

Printing gm.data.frame dumps the whole data set to the console, typically requiring head() to limit the output.

Printing

head(gm.data.frame)
##       country continent year lifeExp      pop gdpPercap
## 1 Afghanistan      Asia 1952  28.801  8425333  779.4453
## 2 Afghanistan      Asia 1957  30.332  9240934  820.8530
## 3 Afghanistan      Asia 1962  31.997 10267083  853.1007
## 4 Afghanistan      Asia 1967  34.020 11537966  836.1971
## 5 Afghanistan      Asia 1972  36.088 13079460  739.9811
## 6 Afghanistan      Asia 1977  38.438 14880372  786.1134

Printing gm_df provides the dimensions, data type of each column, and only prints the first 10 rows.

gm_df
## # A tibble: 1,704 x 6
##    country     continent  year lifeExp      pop gdpPercap
##    <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
##  1 Afghanistan Asia       1952    28.8  8425333      779.
##  2 Afghanistan Asia       1957    30.3  9240934      821.
##  3 Afghanistan Asia       1962    32.0 10267083      853.
##  4 Afghanistan Asia       1967    34.0 11537966      836.
##  5 Afghanistan Asia       1972    36.1 13079460      740.
##  6 Afghanistan Asia       1977    38.4 14880372      786.
##  7 Afghanistan Asia       1982    39.9 12881816      978.
##  8 Afghanistan Asia       1987    40.8 13867957      852.
##  9 Afghanistan Asia       1992    41.7 16317921      649.
## 10 Afghanistan Asia       1997    41.8 22227415      635.
## # ... with 1,694 more rows

`%>%`

The pipe (%>%) is used to chain operations together. Underneath the hood, it’s taking the value on the left-hand side of %>% and using it as the first argument of the function on the right-hand side of %>%.

For example, these 2 lines are doing the exact same thing.

head(gm_df)
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.
gm_df %>% head()
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

For simple operations involving 1 function, %>% is only (arguably) beneficial in that it improves readability as the flow of operations go from left to right.

%>% become truly useful when you need to perform multiple operations in succession, which is the vast majority of data carpentry.

As an arbitrary example, let’s say that we want to select the head() (first 6 rows) of gm.data.frame and convert it to a tibble.

Without %>%, we can do this in a few ways.

Use intermediate variables.
- get gm.data.frame’s head() and assign it to no_pipe_1
- convert no_pipe_1 to a tibble with as_tibble() and assign it to no_pipe_2

no_pipe_1 <- head(gm.data.frame)

no_pipe_2 <- as_tibble(no_pipe_1)

no_pipe_2
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
## * <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

Nest gm.data.frame inside of head(), which is itself nested inside of as_tibble().

as_tibble(head(gm.data.frame))
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
## * <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

With %>%, we can chain these actions together in the order in which they occur, which is also the way we read English.

Here, we do the same thing by:
- taking gm_df
- piping it to head() (keeping the top 6 rows)
- piping it to as_tibble() (converting it to a tibble data frame)

gm_df %>% head() %>% as_tibble()
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

In practice, it’s usually best to place each of the functions on a separate line as it facilitates debugging and further improves readability.

gm_df %>%
  as_tibble() %>%
  head()
## # A tibble: 6 x 6
##   country     continent  year lifeExp      pop gdpPercap
##   <fct>       <fct>     <int>   <dbl>    <int>     <dbl>
## 1 Afghanistan Asia       1952    28.8  8425333      779.
## 2 Afghanistan Asia       1957    30.3  9240934      821.
## 3 Afghanistan Asia       1962    32.0 10267083      853.
## 4 Afghanistan Asia       1967    34.0 11537966      836.
## 5 Afghanistan Asia       1972    36.1 13079460      740.
## 6 Afghanistan Asia       1977    38.4 14880372      786.

From here on, you’ll notice prettify(). This is only being used to print tables in a clean format when the document is knit()ted.

I’m choosing to include it here as I often find myself reading similar pages where I come across a really effective way to format some output. I understand why the author chooses to set echo=FALSE, but it can be nice to see the underlying code without having to hunt through their GitHub.

data.frames will print a default maximum of 3 rows while tibbles will print a default maximum of 10 rows.

prettify <- function(df, n = NULL, cols_changed = NULL, rows_changed = NULL){
  if(is.null(n)) n <- ifelse(is.tibble(df), 10, 3)
  pretty_df <- df %>%
    head(n) %>%
    kable(format = "html") %>%
    kable_styling(bootstrap_options = c("striped", "bordered", "condensed",
                                        "hover", "responsive"),
                  full_width = FALSE)
  
  if(!is.null(cols_changed)){
    pretty_df <- pretty_df %>%
      column_spec(cols_changed, bold = T, color = "black", background = "#C8FAE3")
  }
  
  if(!is.null(rows_changed)){
    pretty_df <- pretty_df %>%
      row_spec(rows_changed, bold = T, color = "black", background = "#C8FAE3")
  }
  
  return(pretty_df)
}
gm.data.frame %>%
  prettify()

country	continent	year	lifeExp	pop	gdpPercap
Afghanistan	Asia	1952	28.801	8425333	779.4453
Afghanistan	Asia	1957	30.332	9240934	820.8530
Afghanistan	Asia	1962	31.997	10267083	853.1007

gm_df %>%
  prettify()

country	continent	year	lifeExp	pop	gdpPercap
Afghanistan	Asia	1952	28.801	8425333	779.4453
Afghanistan	Asia	1957	30.332	9240934	820.8530
Afghanistan	Asia	1962	31.997	10267083	853.1007
Afghanistan	Asia	1967	34.020	11537966	836.1971
Afghanistan	Asia	1972	36.088	13079460	739.9811
Afghanistan	Asia	1977	38.438	14880372	786.1134
Afghanistan	Asia	1982	39.854	12881816	978.0114
Afghanistan	Asia	1987	40.822	13867957	852.3959
Afghanistan	Asia	1992	41.674	16317921	649.3414
Afghanistan	Asia	1997	41.763	22227415	635.3414

Sample Data

You’ll also see a toy data set for the introductory examples that start each section.

sample_countries <- c("Tunisia", "Nicaragua", "Singapore", "Hungary",
                      "New Zealand", "Nigeria", "Brazil", "Sri Lanka",
                      "Ireland", "Australia")
  
sample_df <- gm_df %>%
  filter(year == 2007,
         country %in% sample_countries)

sample_df %>%     
  prettify()

country	continent	year	lifeExp	pop	gdpPercap
Australia	Oceania	2007	81.235	20434176	34435.367
Brazil	Americas	2007	72.390	190010647	9065.801
Hungary	Europe	2007	73.338	9956108	18008.944
Ireland	Europe	2007	78.885	4109086	40675.996
New Zealand	Oceania	2007	80.204	4115771	25185.009
Nicaragua	Americas	2007	72.899	5675356	2749.321
Nigeria	Africa	2007	46.859	135031164	2013.977
Singapore	Asia	2007	79.972	4553009	47143.180
Sri Lanka	Asia	2007	72.396	20378239	3970.095
Tunisia	Africa	2007	73.923	10276158	7092.923

“Tidy” Data

If you’re unsure of what “Tidy” data is actually describing and want to learn more, you can read Hadley Wickham’s article here. Otherwise, these graphics are likely the most concise explanation you’ll find.

With tibbles, %>%, and the concept of tidy data covered, let’s take a dive.

`{dplyr}`

{dplyr} provides a grammar of data manipulation and a set of verb functions that solve most common data carpentry challenges in a consistent fashion.

glimpse()
select()
filter()
arrange()
mutate()
summarize()
group_by()

Taking a `glimpse()`

In addition to the summary(), dim()ensions, and str()ucture functions that can be used to inspect data, you can now use {dplyr}’s glimpse().

summary(gm.data.frame)
##         country        continent        year         lifeExp     
##  Afghanistan:  12   Africa  :624   Min.   :1952   Min.   :23.60  
##  Albania    :  12   Americas:300   1st Qu.:1966   1st Qu.:48.20  
##  Algeria    :  12   Asia    :396   Median :1980   Median :60.71  
##  Angola     :  12   Europe  :360   Mean   :1980   Mean   :59.47  
##  Argentina  :  12   Oceania : 24   3rd Qu.:1993   3rd Qu.:70.85  
##  Australia  :  12                  Max.   :2007   Max.   :82.60  
##  (Other)    :1632                                                
##       pop              gdpPercap       
##  Min.   :6.001e+04   Min.   :   241.2  
##  1st Qu.:2.794e+06   1st Qu.:  1202.1  
##  Median :7.024e+06   Median :  3531.8  
##  Mean   :2.960e+07   Mean   :  7215.3  
##  3rd Qu.:1.959e+07   3rd Qu.:  9325.5  
##  Max.   :1.319e+09   Max.   :113523.1  
## 
dim(gm.data.frame)
## [1] 1704    6
str(gm.data.frame)
## 'data.frame':    1704 obs. of  6 variables:
##  $ country  : Factor w/ 142 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ continent: Factor w/ 5 levels "Africa","Americas",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ year     : int  1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 ...
##  $ lifeExp  : num  28.8 30.3 32 34 36.1 ...
##  $ pop      : int  8425333 9240934 10267083 11537966 13079460 14880372 12881816 13867957 16317921 22227415 ...
##  $ gdpPercap: num  779 821 853 836 740 ...
glimpse(gm_df)
## Observations: 1,704
## Variables: 6
## $ country   <fct> Afghanistan, Afghanistan, Afghanistan, Afghanistan, ...
## $ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia...
## $ year      <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992...
## $ lifeExp   <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.8...
## $ pop       <int> 8425333, 9240934, 10267083, 11537966, 13079460, 1488...
## $ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 78...

`select()` columns

Quick Example

Initial Data

sample_df %>%
  prettify()

country	continent	year	lifeExp	pop	gdpPercap
Australia	Oceania	2007	81.235	20434176	34435.367
Brazil	Americas	2007	72.390	190010647	9065.801
Hungary	Europe	2007	73.338	9956108	18008.944
Ireland	Europe	2007	78.885	4109086	40675.996
New Zealand	Oceania	2007	80.204	4115771	25185.009
Nicaragua	Americas	2007	72.899	5675356	2749.321
Nigeria	Africa	2007	46.859	135031164	2013.977
Singapore	Asia	2007	79.972	4553009	47143.180
Sri Lanka	Asia	2007	72.396	20378239	3970.095
Tunisia	Africa	2007	73.923	10276158	7092.923

End Data

sample_df %>%
  select(country, pop) %>%
  prettify()

country	pop
Australia	20434176
Brazil	190010647
Hungary	9956108
Ireland	4109086
New Zealand	4115771
Nicaragua	5675356
Nigeria	135031164
Singapore	4553009
Sri Lanka	20378239
Tunisia	10276158

The select() family is used to choose columns to keep. You can use bare (unquoted) names.

select() columns by specific names.
- select only gm_df’s country and pop columns

gm_df %>%
  select(country, year, pop) %>%            # select columns by specific names
  prettify()

country	year	pop
Afghanistan	1952	8425333
Afghanistan	1957	9240934
Afghanistan	1962	10267083
Afghanistan	1967	11537966
Afghanistan	1972	13079460
Afghanistan	1977	14880372
Afghanistan	1982	12881816
Afghanistan	1987	13867957
Afghanistan	1992	16317921
Afghanistan	1997	22227415

select() a range of columns by name
- select gm_df’s continent column and all columns from lifeExp to gdpPercap

gm_df %>%
  select(continent, lifeExp:gdpPercap) %>%  # select columns name range
  prettify()

continent	lifeExp	pop	gdpPercap
Asia	28.801	8425333	779.4453
Asia	30.332	9240934	820.8530
Asia	31.997	10267083	853.1007
Asia	34.020	11537966	836.1971
Asia	36.088	13079460	739.9811
Asia	38.438	14880372	786.1134
Asia	39.854	12881816	978.0114
Asia	40.822	13867957	852.3959
Asia	41.674	16317921	649.3414
Asia	41.763	22227415	635.3414

deselect() a column with -
- select() all of gm_df’s columns except lifeExp

gm_df %>%
  select(-lifeExp) %>%                      # deselect column by name
  prettify()

country	continent	year	pop	gdpPercap
Afghanistan	Asia	1952	8425333	779.4453
Afghanistan	Asia	1957	9240934	820.8530
Afghanistan	Asia	1962	10267083	853.1007
Afghanistan	Asia	1967	11537966	836.1971
Afghanistan	Asia	1972	13079460	739.9811
Afghanistan	Asia	1977	14880372	786.1134
Afghanistan	Asia	1982	12881816	978.0114
Afghanistan	Asia	1987	13867957	852.3959
Afghanistan	Asia	1992	16317921	649.3414
Afghanistan	Asia	1997	22227415	635.3414

deselect() a range of columns by name
- select() all of gm_df’s columns except those between lifeExp and gdpPercap

gm_df %>%
  select(-c(lifeExp:gdpPercap)) %>%         # deselect column by name range
  prettify()

country	continent	year
Afghanistan	Asia	1952
Afghanistan	Asia	1957
Afghanistan	Asia	1962
Afghanistan	Asia	1967
Afghanistan	Asia	1972
Afghanistan	Asia	1977
Afghanistan	Asia	1982
Afghanistan	Asia	1987
Afghanistan	Asia	1992
Afghanistan	Asia	1997

select() column by index
- select() gm_df’s 4th column

gm_df %>%
  select(4) %>%                             # select column by index
  prettify()

lifeExp
28.801
30.332
31.997
34.020
36.088
38.438
39.854
40.822
41.674
41.763

deselect() a column by index
- select() all of gm_df’s columns except for the 4th column

gm_df %>%
  select(-4) %>%                         # deselect column by index
  prettify()

country	continent	year	pop	gdpPercap
Afghanistan	Asia	1952	8425333	779.4453
Afghanistan	Asia	1957	9240934	820.8530
Afghanistan	Asia	1962	10267083	853.1007
Afghanistan	Asia	1967	11537966	836.1971
Afghanistan	Asia	1972	13079460	739.9811
Afghanistan	Asia	1977	14880372	786.1134
Afghanistan	Asia	1982	12881816	978.0114
Afghanistan	Asia	1987	13867957	852.3959
Afghanistan	Asia	1992	16317921	649.3414
Afghanistan	Asia	1997	22227415	635.3414

deselect() a range of columns by index
- select() all of gm_df’s columns except those between the 3rd and 5th columns

gm_df %>%
  select(-c(3:5)) %>%                    # deselect columns by index range
  prettify()

country	continent	gdpPercap
Afghanistan	Asia	779.4453
Afghanistan	Asia	820.8530
Afghanistan	Asia	853.1007
Afghanistan	Asia	836.1971
Afghanistan	Asia	739.9811
Afghanistan	Asia	786.1134
Afghanistan	Asia	978.0114
Afghanistan	Asia	852.3959
Afghanistan	Asia	649.3414
Afghanistan	Asia	635.3414

`ggplot()` Exercise 1

{ggplot2} is monster of a package used for data visualization that follows The Grammar of Graphics.

{ggplot2} takes R’s powerful graphics capabilities and makes them more accessible by taking care of many plotting tasks that are often tedious, while still allowing for lower-level customization.

Basic Setup

ggplot(your data, aes(x =x values, y =y values)) +
geom_boxplot() the type of plot geometry desired

Steps

Using gm_df, select the lifeExp column
Pipe (%>%) the result to ggplot()
Select the plot’s aes()thetic values
- lifeExp for the x values
  - a histogram’s y are counts of its x values, so we don’t provide them here
Add geom_histogram() as the geometry of the plot

gm_df %>%                                     # data frame: Data
  select(lifeExp) %>%                         # columns to keep: Data
  ggplot(aes(x = lifeExp)) +                  # x values: Aesthetics
  geom_histogram()                            # histogram: Geometries

Figure 1: Figure 1

`filter()` Rows

Quick Example

Initial Data

sample_df %>%
  select(country, lifeExp) %>%
  prettify()

country	lifeExp
Australia	81.235
Brazil	72.390
Hungary	73.338
Ireland	78.885
New Zealand	80.204
Nicaragua	72.899
Nigeria	46.859
Singapore	79.972
Sri Lanka	72.396
Tunisia	73.923

End Data

sample_df %>%
  select(country, lifeExp) %>%
  filter(lifeExp > 75) %>%
  prettify(cols_changed = 2)

country	lifeExp
Australia	81.235
Ireland	78.885
New Zealand	80.204
Singapore	79.972

Use filter() to select rows using logic. Rows where a logical expression returns TRUE are kept and others are dropped.

filter() rows where numeric() values are greater or lesser than another value
- filter() gm_df to only keep rows where gdpPercap < 500

gm_df %>%
  filter(gdpPercap < 500) %>%
  prettify(cols_changed = 6)

country	continent	year	lifeExp	pop	gdpPercap
Burundi	Africa	1952	39.031	2445618	339.2965
Burundi	Africa	1957	40.533	2667518	379.5646
Burundi	Africa	1962	42.045	2961915	355.2032
Burundi	Africa	1967	43.548	3330989	412.9775
Burundi	Africa	1972	44.057	3529983	464.0995
Burundi	Africa	1997	45.326	6121610	463.1151
Burundi	Africa	2002	47.360	7021078	446.4035
Burundi	Africa	2007	49.580	8390505	430.0707
Cambodia	Asia	1952	39.417	4693836	368.4693
Cambodia	Asia	1957	41.366	5322536	434.0383

filter() rows using multiple logical expressions where all must be TRUE
- filter() gm_df to only keep rows where year > 1990 and lifeExp < 40
- , and & are evaluated identically in filter()

gm_df %>%
  filter(year > 1990, lifeExp < 40) %>%
  prettify(cols_changed = 3:4)

country	continent	year	lifeExp	pop	gdpPercap
Rwanda	Africa	1992	23.599	7290203	737.0686
Rwanda	Africa	1997	36.087	7212583	589.9445
Sierra Leone	Africa	1992	38.333	4260884	1068.6963
Sierra Leone	Africa	1997	39.897	4578212	574.6482
Somalia	Africa	1992	39.658	6099799	926.9603
Swaziland	Africa	2007	39.613	1133066	4513.4806
Zambia	Africa	2002	39.193	10595811	1071.6139
Zimbabwe	Africa	2002	39.989	11926563	672.0386

filter() rows using multiple logical expressions where one must be TRUE
- filter() gm_df to only keep rows where pop < 10000 or gdpPercap > 100000
- | means or

gm_df %>%
  filter(pop < 10000 | gdpPercap > 100000) %>%
  prettify(cols_changed = 5:6)

country	continent	year	lifeExp	pop	gdpPercap
Kuwait	Asia	1952	55.565	160000	108382.4
Kuwait	Asia	1957	58.033	212846	113523.1
Kuwait	Asia	1972	67.712	841934	109347.9

filter() rows using a string
- filter() gm_df to only keep rows where year is 1999 and continent is "Europe"
- == means is equal to

gm_df %>%
  filter(year == 1997 & continent == "Europe") %>%
  prettify(cols_changed = 2:3)

country	continent	year	lifeExp	pop	gdpPercap
Albania	Europe	1997	72.950	3428038	3193.055
Austria	Europe	1997	77.510	8069876	29095.921
Belgium	Europe	1997	77.530	10199787	27561.197
Bosnia and Herzegovina	Europe	1997	73.244	3607000	4766.356
Bulgaria	Europe	1997	70.320	8066057	5970.389
Croatia	Europe	1997	73.680	4444595	9875.605
Czech Republic	Europe	1997	74.010	10300707	16048.514
Denmark	Europe	1997	76.110	5283663	29804.346
Finland	Europe	1997	77.130	5134406	23723.950
France	Europe	1997	78.640	58623428	25889.785

`ggplot()` Exercise 2

Steps

Using gm_df, select the continent, country, and gdpPercap columns
filter() the rows to only keep those where continent == "Oceania"
Pipe (%>%) the result to ggplot()
Select the plot’s aes()thetic values
- country for the x values
- gdpPercap for the y values
Add geom_boxplot() as the geometry of the plot

gm_df %>%                                         # data frame: Data
  select(continent, country, gdpPercap) %>%       # columns to keep: Data
  filter(continent == "Oceania") %>%              # rows to keep: Data
  ggplot(aes(x = country, y = gdpPercap)) +       # x and y values: Aesthetics
  geom_boxplot()                                  # box plot: Geometries

`mutate()` Columns

Quick Example

Initial Data

sample_df %>%
  select(country, pop) %>%
  prettify()

country	pop
Australia	20434176
Brazil	190010647
Hungary	9956108
Ireland	4109086
New Zealand	4115771
Nicaragua	5675356
Nigeria	135031164
Singapore	4553009
Sri Lanka	20378239
Tunisia	10276158

End Data

sample_df %>%
  select(country, pop) %>%
  mutate(pop_in_thousands = pop / 1000) %>%
  prettify(cols_changed = 3)

country	pop	pop_in_thousands
Australia	20434176	20434.176
Brazil	190010647	190010.647
Hungary	9956108	9956.108
Ireland	4109086	4109.086
New Zealand	4115771	4115.771
Nicaragua	5675356	5675.356
Nigeria	135031164	135031.164
Singapore	4553009	4553.009
Sri Lanka	20378239	20378.239
Tunisia	10276158	10276.158

Use mutate() to manipulate column values and create new columns.

In order to mutate() a column, use the name of the column you are manipulating and set its value using =.

Here’s a silly example:

Add a new column to gm_df
- mutate() gm_df to create a column named planet and set its value to "Earth"

gm_df %>%
  mutate(planet = "Earth") %>%
  prettify(cols_changed = 7)

country	continent	year	lifeExp	pop	gdpPercap	planet
Afghanistan	Asia	1952	28.801	8425333	779.4453	Earth
Afghanistan	Asia	1957	30.332	9240934	820.8530	Earth
Afghanistan	Asia	1962	31.997	10267083	853.1007	Earth
Afghanistan	Asia	1967	34.020	11537966	836.1971	Earth
Afghanistan	Asia	1972	36.088	13079460	739.9811	Earth
Afghanistan	Asia	1977	38.438	14880372	786.1134	Earth
Afghanistan	Asia	1982	39.854	12881816	978.0114	Earth
Afghanistan	Asia	1987	40.822	13867957	852.3959	Earth
Afghanistan	Asia	1992	41.674	16317921	649.3414	Earth
Afghanistan	Asia	1997	41.763	22227415	635.3414	Earth

Since we have gdpPercap and pop, we can calculate the values for a total_GDP column.

mutate() gm_df to set the results of a calculation on each row to a new column
- multiply pop * gdpPercap and assign the result to total_GDP inside mutate()

gm_df %>%
  mutate(total_GDP = pop * gdpPercap) %>%
  prettify(cols_changed = 7)

country	continent	year	lifeExp	pop	gdpPercap	total_GDP
Afghanistan	Asia	1952	28.801	8425333	779.4453	6567086330
Afghanistan	Asia	1957	30.332	9240934	820.8530	7585448670
Afghanistan	Asia	1962	31.997	10267083	853.1007	8758855797
Afghanistan	Asia	1967	34.020	11537966	836.1971	9648014150
Afghanistan	Asia	1972	36.088	13079460	739.9811	9678553274
Afghanistan	Asia	1977	38.438	14880372	786.1134	11697659231
Afghanistan	Asia	1982	39.854	12881816	978.0114	12598563401
Afghanistan	Asia	1987	40.822	13867957	852.3959	11820990309
Afghanistan	Asia	1992	41.674	16317921	649.3414	10595901589
Afghanistan	Asia	1997	41.763	22227415	635.3414	14121995875

Typically, mutate() is used to perform operations across columns in each individual row. You can also use summary functions to perform operations on individual columns (acting as vectors) that result in a vector that can be assigned to a column.

Makes sense, right??

Let’s calculate the z-score of each gdpPercap value for a specific year.

\[ z = \frac {x_i -\mu_x} {\sigma_x}\]

\(x\) = gdpPercap
\(\mu_x\) = the mean of \(x\) = mean(gdpPercap)
\(\sigma_x\) = the standard deviation of x = sd(gdpPercap)
Use a summary function to perform a a calculation involving summary statistics of a column
- subtract mean(gdpPercap) from gdpPercap
- divide the result by sd(gdpPercap)
- set the results as the values of a new column called gdp_per_cap_z_score

gm_df %>%
  filter(year == 1977) %>%
  mutate(gdp_per_cap_z_score = (gdpPercap - mean(gdpPercap)) / sd(gdpPercap)) %>%
  prettify(cols_changed = 7)

country	continent	year	lifeExp	pop	gdpPercap	gdp_per_cap_z_score
Afghanistan	Asia	1977	38.438	14880372	786.1134	-0.7805156
Albania	Europe	1977	68.930	2509048	3533.0039	-0.4520380
Algeria	Africa	1977	58.014	17152804	4910.4168	-0.2873247
Angola	Africa	1977	39.483	6162675	3008.6474	-0.5147414
Argentina	Americas	1977	68.481	26983828	10079.0267	0.3307461
Australia	Oceania	1977	73.490	14074100	18334.1975	1.3179128
Austria	Europe	1977	72.170	7568430	19749.4223	1.4871476
Bahrain	Asia	1977	65.593	297410	19340.1020	1.4382004
Bangladesh	Asia	1977	46.923	80428306	659.8772	-0.7956111
Belgium	Europe	1977	72.800	9821800	19117.9745	1.4116381

Here are other functions that can be used similarly:

Summary Functions
`first()`	`min()`
`last()`	`max()`
`nth()`	`mean()`
`n()`	`median()`
`n_distinct()`	`var()`
`IQR()`	`sd()`

`ggplot()` Exercise 3

Steps

Using gm_df, select() country, year, and gdpPercap
filter() the rows to keep only those where country is "Korea, Rep.", "Korea, Dem. Rep.", "Japan", or "China"
Pipe the result to ggplot()
Select the plot’s aes()thetic values
- year for the x values
- gdpPercap for the y values
- country for the color values

Add geom_line() as the geometry of the plot
Add a title to the plot with labs()

gm_df %>%
  filter(country %in% c("Korea, Rep.", "Korea, Dem. Rep.", "Japan", "China")) %>%
  mutate(total_GDP = pop * gdpPercap) %>%
  ggplot(aes(x = year, y = gdpPercap, color = country)) +
  geom_line() +
  labs(title = "GDP Over Time")

`arrange()` Rows

Quick Example

Initial Data

sample_df %>%
  select(country, gdpPercap) %>%
  prettify()

country	gdpPercap
Australia	34435.367
Brazil	9065.801
Hungary	18008.944
Ireland	40675.996
New Zealand	25185.009
Nicaragua	2749.321
Nigeria	2013.977
Singapore	47143.180
Sri Lanka	3970.095
Tunisia	7092.923

End Data

sample_df %>%
  select(country, gdpPercap)%>%
  arrange(gdpPercap) %>%
  prettify(cols_changed = 2)

country	gdpPercap
Nigeria	2013.977
Nicaragua	2749.321
Sri Lanka	3970.095
Tunisia	7092.923
Brazil	9065.801
Hungary	18008.944
New Zealand	25185.009
Australia	34435.367
Ireland	40675.996
Singapore	47143.180

Use arrange() to sort rows.

arrange() by ascending number (smallest to largest)
- arrange() gm_df’s pop column so that smallest populations are on top

gm_df %>%
  arrange(pop) %>%
  prettify(cols_changed = 5)

country	continent	year	lifeExp	pop	gdpPercap
Sao Tome and Principe	Africa	1952	46.471	60011	879.5836
Sao Tome and Principe	Africa	1957	48.945	61325	860.7369
Djibouti	Africa	1952	34.812	63149	2669.5295
Sao Tome and Principe	Africa	1962	51.893	65345	1071.5511
Sao Tome and Principe	Africa	1967	54.425	70787	1384.8406
Djibouti	Africa	1957	37.328	71851	2864.9691
Sao Tome and Principe	Africa	1972	56.480	76595	1532.9853
Sao Tome and Principe	Africa	1977	58.550	86796	1737.5617
Djibouti	Africa	1962	39.693	89898	3020.9893
Sao Tome and Principe	Africa	1982	60.351	98593	1890.2181

arrange() by desc() number (largest to smallest)
- arrange() the lifeExp column so that largest values are on top

gm_df %>%
  arrange(desc(lifeExp)) %>%
  prettify(cols_changed = 4)

country	continent	year	lifeExp	pop	gdpPercap
Japan	Asia	2007	82.603	127467972	31656.07
Hong Kong, China	Asia	2007	82.208	6980412	39724.98
Japan	Asia	2002	82.000	127065841	28604.59
Iceland	Europe	2007	81.757	301931	36180.79
Switzerland	Europe	2007	81.701	7554661	37506.42
Hong Kong, China	Asia	2002	81.495	6762476	30209.02
Australia	Oceania	2007	81.235	20434176	34435.37
Spain	Europe	2007	80.941	40448191	28821.06
Sweden	Europe	2007	80.884	9031088	33859.75
Israel	Asia	2007	80.745	6426679	25523.28

arrange() alphabetically
- filter() gm_df to keep only those rows where year == 2007 and continent == "Americas"
- arrange() the country column alphabetically

gm_df %>%
  filter(year == 2007, continent == "Americas") %>%
  arrange(country) %>%
  prettify(cols_changed = 2:3)

country	continent	year	lifeExp	pop	gdpPercap
Argentina	Americas	2007	75.320	40301927	12779.380
Bolivia	Americas	2007	65.554	9119152	3822.137
Brazil	Americas	2007	72.390	190010647	9065.801
Canada	Americas	2007	80.653	33390141	36319.235
Chile	Americas	2007	78.553	16284741	13171.639
Colombia	Americas	2007	72.889	44227550	7006.580
Costa Rica	Americas	2007	78.782	4133884	9645.061
Cuba	Americas	2007	78.273	11416987	8948.103
Dominican Republic	Americas	2007	72.235	9319622	6025.375
Ecuador	Americas	2007	74.994	13755680	6873.262

`group_by()` for Grouped Data

Quick Example

Initial Data

sample_df %>%
  select(country, continent, pop) %>%
  prettify()

country	continent	pop
Australia	Oceania	20434176
Brazil	Americas	190010647
Hungary	Europe	9956108
Ireland	Europe	4109086
New Zealand	Oceania	4115771
Nicaragua	Americas	5675356
Nigeria	Africa	135031164
Singapore	Asia	4553009
Sri Lanka	Asia	20378239
Tunisia	Africa	10276158

End Data

sample_df %>%
  select(country, continent, pop) %>%
  group_by(continent) %>%
  mutate(pop_by_continent = sum(pop)) %>%
  ungroup() %>%
  arrange(pop_by_continent) %>%
  prettify(cols_changed = 4)

country	continent	pop	pop_by_continent
Hungary	Europe	9956108	14065194
Ireland	Europe	4109086	14065194
Australia	Oceania	20434176	24549947
New Zealand	Oceania	4115771	24549947
Singapore	Asia	4553009	24931248
Sri Lanka	Asia	20378239	24931248
Nigeria	Africa	135031164	145307322
Tunisia	Africa	10276158	145307322
Brazil	Americas	190010647	195686003
Nicaragua	Americas	5675356	195686003

group_by() allows us to group rows together based on column values.

Let’s say we wanted to compute summary values for each country for all years.

Calculate the mean_gdp_per_cap of each country with group_by()
- take gm_df and group_by() country to group rows of the same country together
- use mean() to calculate the mean_gdp_per_cap
- ungroup() the rows
  - a habit you want
- keep only those rows with distinct() combinations of country and mean_gdp_per_cap
  - distinct()’s default is to only keep columns used as arguments

gm_df %>%
  group_by(country) %>%
  mutate(mean_gdp_per_cap = median(gdpPercap)) %>% 
  ungroup() %>%
  distinct(country, mean_gdp_per_cap) %>% 
  prettify(cols_changed = 2)

country	mean_gdp_per_cap
Afghanistan	803.4832
Albania	3253.2384
Algeria	4853.8559
Angola	3264.6288
Argentina	9068.7844
Australia	18905.6034
Austria	20673.2530
Bahrain	18779.8016
Bangladesh	703.7638
Belgium	20048.9102

`ggplot()` Exercise 4

Steps

Using gm_df, group_by() the continent and year
mutate() to add a column called mean_gdp for the average GDP of each continent
ungroup() the data, because this is a habit that will save you headaches later
Keep only distinct() combinations of continent, year, and mean_gdp
Pipe the result to ggplot()
Select the plot’s aes()thetic values
- year for the x values
- mean_gdp for the y values
- continent for the color values
Add geom_line() as the geometry of the plot
Add a title and a caption (for the source of the data) to the plot with labs()

gm_df %>%
  group_by(year, continent) %>%
  mutate(mean_gdp = mean(gdpPercap)) %>%
  ungroup() %>%
  distinct(continent, year, mean_gdp) %>%
  ggplot(aes(x = year, y = mean_gdp, color = continent)) +
  geom_line() +
  labs(title = "Mean GDPs by Continent Over Time",
       caption = "Source: Free material from www.gapminder.org")

`summarize()`

Quick Example

Initial Data

sample_df %>%
  select(country, continent, lifeExp, pop) %>%
  prettify()

country	continent	lifeExp	pop
Australia	Oceania	81.235	20434176
Brazil	Americas	72.390	190010647
Hungary	Europe	73.338	9956108
Ireland	Europe	78.885	4109086
New Zealand	Oceania	80.204	4115771
Nicaragua	Americas	72.899	5675356
Nigeria	Africa	46.859	135031164
Singapore	Asia	79.972	4553009
Sri Lanka	Asia	72.396	20378239
Tunisia	Africa	73.923	10276158

sample_df %>%
  select(country, continent, lifeExp, pop) %>%
  group_by(continent) %>%
  summarise(max_pop = max(pop),
            mean_life_exp = mean(lifeExp)) %>%
  prettify(cols_changed = 2:3)

continent	max_pop	mean_life_exp
Africa	135031164	60.3910
Americas	190010647	72.6445
Asia	20378239	76.1840
Europe	9956108	76.1115
Oceania	20434176	80.7195

Now that we know how to use group_by(), we can summarize() data by group. This can be done using all of the summary functions seen earlier.

Summary Functions
`first()`	`min()`
`last()`	`max()`
`nth()`	`mean()`
`n()`	`median()`
`n_distinct()`	`var()`
`IQR()`	`sd()`

Calculate some summary statistics for each continent.
- take gm_df and group_by() continent
- using summarize() or summarise(), calculate:
  - count with n()
  - mean_pop with mean()
  - max_gdp_per_cap with max()

gm_df %>%
  group_by(continent) %>%
  summarise(count = n(),
            mean_pop = mean(pop),
            max_gdp_per_cap = max(gdpPercap)) %>%
  prettify(cols_changed = 2:4)

continent	count	mean_pop	max_gdp_per_cap
Africa	624	9916003	21951.21
Americas	300	24504795	42951.65
Asia	396	77038722	113523.13
Europe	360	17169765	49357.19
Oceania	24	8874672	34435.37

`ggplot()` Exercise 5

Steps

Using gm_df, filter() the data to remove rows where continent is not "Oceania"
group_by() continent and year
summarize() the groups by calculating them mean() of pop
ungroup() the data, because this is a habit that will save you headaches later
Pipe the results to ggplot()
Select the plot’s aes()thetics
- year for the x values
- mean_pop for the y values
- continent for the color values
Add geom_line() for the first geometry
Add geom_point() for the second geometry
Change the theme by adding theme_minimal()
Using facet_wrap(), split the plot into panels for each continent
- ~ is used as a formula to select the facet variable
Add a title and a caption with labs()

gm_df %>%
  filter(continent != "Oceania") %>%
  group_by(continent, year) %>%
  summarise(mean_pop = mean(pop)) %>%
  ungroup() %>%
  ggplot(aes(x = year, y = mean_pop,
             color = continent)) +
  geom_line() +
  geom_point() +
  theme_minimal() +
  facet_wrap(~ continent) +
  labs(title = "Mean Continent Populations over Time",
       caption = "Source: Free material from www.gapminder.org")

النهاية

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 17134)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United States.1252 
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] bindrcpp_0.2.2       kableExtra_0.9.0     knitr_1.20.8        
##  [4] gapminder_0.3.0      forcats_0.3.0        stringr_1.3.1       
##  [7] dplyr_0.7.6          purrr_0.2.5          readr_1.1.1         
## [10] tidyr_0.8.1          tibble_1.4.2.9004    ggplot2_3.0.0.9000  
## [13] tidyverse_1.2.1.9000
## 
## loaded via a namespace (and not attached):
##  [1] tidyselect_0.2.4  xfun_0.3          reshape2_1.4.3   
##  [4] haven_1.1.2       lattice_0.20-35   colorspace_1.3-2 
##  [7] viridisLite_0.3.0 htmltools_0.3.6   yaml_2.1.19      
## [10] utf8_1.1.4        rlang_0.2.1       pillar_1.3.0.9000
## [13] withr_2.1.2       foreign_0.8-70    glue_1.2.0       
## [16] modelr_0.1.2      readxl_1.1.0      bindr_0.1.1      
## [19] plyr_1.8.4        munsell_0.5.0     blogdown_0.7.1   
## [22] gtable_0.2.0      cellranger_1.1.0  rvest_0.3.2      
## [25] codetools_0.2-15  psych_1.8.4       evaluate_0.10.1  
## [28] labeling_0.3      parallel_3.5.1    fansi_0.2.3      
## [31] highr_0.7         broom_0.4.5       Rcpp_0.12.17     
## [34] scales_0.5.0.9000 jsonlite_1.5      mnormt_1.5-5     
## [37] hms_0.4.2         digest_0.6.15     stringi_1.2.3    
## [40] bookdown_0.7      grid_3.5.1        cli_1.0.0        
## [43] tools_3.5.1       magrittr_1.5      lazyeval_0.2.1   
## [46] crayon_1.3.4      pkgconfig_2.0.1   xml2_1.2.0       
## [49] lubridate_1.7.4   assertthat_0.2.0  rmarkdown_1.10.7 
## [52] httr_1.3.1        rstudioapi_0.7    htmldeps_0.1.0   
## [55] R6_2.2.2          nlme_3.1-137      compiler_3.5.1

To leave a comment for the author, please follow the link and comment on their blog: R Bloggers on syknapptic.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Context

Reflection

Workflow

Resources Up Front

Data Carpentry

Plotting

Our Data

tibble

Printing

%>%

Sample Data

“Tidy” Data

{dplyr}

Taking a glimpse()

select() columns

Quick Example

Initial Data

End Data

ggplot() Exercise 1

filter() Rows

Quick Example

Initial Data

End Data

ggplot() Exercise 2

mutate() Columns

Quick Example

Initial Data

End Data

ggplot() Exercise 3

arrange() Rows

Quick Example

Initial Data

End Data

group_by() for Grouped Data

Quick Example

Initial Data

End Data

ggplot() Exercise 4

summarize()

Quick Example

Initial Data

ggplot() Exercise 5

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

`tibble`

`%>%`

`{dplyr}`

Taking a `glimpse()`

`select()` columns

`ggplot()` Exercise 1

`filter()` Rows

`ggplot()` Exercise 2

`mutate()` Columns

`ggplot()` Exercise 3

`arrange()` Rows

`group_by()` for Grouped Data

`ggplot()` Exercise 4

`summarize()`

`ggplot()` Exercise 5

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)