V is for Verbs

April 25, 2020
By

[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this series, I’ve covered five terms for data manipulation:

  • arrange
  • filter
  • mutate
  • select
  • summarise

These are the verbs that make up the grammar of data manipulation. They all work with group_by to perform these functions groupwise.

There are scoped versions of these verbs, which add _all, _if, or _at, that allow you to perform these verbs on multiple variables simultaneously. For instance, I could get means for all of my numeric variables like this. (Quick note: I created an updated reading dataset that has all publication years filled in. You can download it here.)

library(tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --
##  ggplot2 3.2.1      purrr   0.3.3
## tibble 2.1.3 dplyr 0.8.3
## tidyr 1.0.0 stringr 1.4.0
## readr 1.3.1 forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv",
col_names = TRUE)
## Parsed with column specification:
## cols(
## Title = col_character(),
## Pages = col_double(),
## date_started = col_character(),
## date_read = col_character(),
## Book.ID = col_double(),
## Author = col_character(),
## AdditionalAuthors = col_character(),
## AverageRating = col_double(),
## OriginalPublicationYear = col_double(),
## read_time = col_double(),
## MyRating = col_double(),
## Gender = col_double(),
## Fiction = col_double(),
## Childrens = col_double(),
## Fantasy = col_double(),
## SciFi = col_double(),
## Mystery = col_double(),
## SelfHelp = col_double()
## )
reads2019 %>%
summarise_if(is.numeric, list(mean))
## # A tibble: 1 x 13
## Pages Book.ID AverageRating OriginalPublica… read_time MyRating Gender Fiction
##
## 1 341. 1.36e7 3.94 1989. 3.92 4.14 0.310 0.931
## # … with 5 more variables: Childrens , Fantasy , SciFi ,
## # Mystery , SelfHelp

This function generated the mean for every numeric variable in my dataset. But even though they’re all numeric, the mean isn’t the best statistic for many of them, for instance average book ID or publication year. We could just generate means for specific variables with summarise_at.

reads2019 %>%
summarise_at(vars(Pages, AverageRating, read_time, MyRating), list(mean))
## # A tibble: 1 x 4
## Pages AverageRating read_time MyRating
##
## 1 341. 3.94 3.92 4.14

You can also request more than one piece of information in your list, and request that R create a new label for each variable.

numeric_summary <- reads2019 %>%
summarise_at(vars(Pages, AverageRating, read_time, MyRating), list("mean" = mean, "median" = median))

I use the basic verbs anytime I use R. I only learned about scoped verbs recently, and I’m sure I’ll add them to my toolkit over time.

Next week is the last week of Blogging A to Z! See you then!

To leave a comment for the author, please follow the link and comment on their blog: Deeply Trivial.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)