F is for filter
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
For the letter F – filters! Filters are incredibly useful, especially when combined with the main pipe %>%. I frequently use filters along with ggplot functions, to chart a specific subgroup or remove missing cases or outliers. As one example, I could use a filter to chart only fiction books from my reading dataset.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
library(tidyverse)
## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 --
## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3
## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3
## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0
## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0
## -- Conflicts ---------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
reads2019 <- read_csv("~/Downloads/Blogging A to Z/SarasReads2019_allrated.csv", col_names = TRUE)
## Parsed with column specification:
## cols(
## Title = col_character(),
## Pages = col_double(),
## date_started = col_character(),
## date_read = col_character(),
## Book.ID = col_double(),
## Author = col_character(),
## AdditionalAuthors = col_character(),
## AverageRating = col_double(),
## OriginalPublicationYear = col_double(),
## read_time = col_double(),
## MyRating = col_double(),
## Gender = col_double(),
## Fiction = col_double(),
## Childrens = col_double(),
## Fantasy = col_double(),
## SciFi = col_double(),
## Mystery = col_double(),
## SelfHelp = col_double()
## )
reads2019 %>%
filter(Fiction == 1) %>%
ggplot(aes(Pages)) +
geom_histogram() +
scale_y_continuous(breaks = seq(0,16,1)) +
scale_x_continuous(breaks = seq(0,1200,100)) +
ylab("Frequency") +
theme_classic()
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.I could also use filters to create a new dataset – perhaps one of my top books I read during 2019.library(magrittr) ## ## Attaching package: 'magrittr' ## The following object is masked from 'package:purrr': ## ## set_names ## The following object is masked from 'package:tidyr': ## ## extract top_books <- reads2019 %>% filter(MyRating == 5) top_books %$% list(Title) ## [[1]] ## [1] "1Q84" ## [2] "Alas, Babylon" ## [3] "Elevation" ## [4] "Guards! Guards! (Discworld, #8; City Watch #1)" ## [5] "How Music Works" ## [6] "Lords and Ladies (Discworld, #14; Witches #4)" ## [7] "Moving Pictures (Discworld, #10; Industrial Revolution, #1)" ## [8] "Redshirts" ## [9] "Swarm Theory" ## [10] "The Android's Dream (The Android's Dream #1)" ## [11] "The Dutch House" ## [12] "The Emerald City of Oz (Oz #6)" ## [13] "The End of Mr. Y" ## [14] "The Human Division (Old Man's War, #5)" ## [15] "The Last Colony (Old Man's War, #3)" ## [16] "The Long Utopia (The Long Earth #4)" ## [17] "The Marvelous Land of Oz (Oz, #2)" ## [18] "The Miraculous Journey of Edward Tulane" ## [19] "The Night Circus" ## [20] "The Patchwork Girl of Oz (Oz, #7)" ## [21] "The Patron Saint of Liars" ## [22] "The Wonderful Wizard of Oz (Oz, #1)" ## [23] "The Year of the Flood (MaddAddam, #2)" ## [24] "Witches Abroad (Discworld, #12; Witches #3)" ## [25] "Wyrd Sisters (Discworld, #6; Witches #2)"Or I could create one of the 10 longest books I read:
long_books <- reads2019 %>% arrange(desc(Pages)) %>% filter(between(row_number(), 1, 10)) %>% select(Title, Pages) library(expss) ## ## Use 'expss_output_viewer()' to display tables in the RStudio Viewer. ## To return to the console output, use 'expss_output_default()'. ## ## Attaching package: 'expss' ## The following objects are masked from 'package:magrittr': ## ## and, equals, or ## The following objects are masked from 'package:stringr': ## ## fixed, regex ## The following objects are masked from 'package:dplyr': ## ## between, compute, contains, first, last, na_if, recode, vars ## The following objects are masked from 'package:purrr': ## ## keep, modify, modify_if, transpose ## The following objects are masked from 'package:tidyr': ## ## contains, nest ## The following object is masked from 'package:ggplot2': ## ## vars as.etable(long_books, rownames_as_row_labels = FALSE)
| Title | Pages |
|---|---|
| It | 1156 |
| 1Q84 | 925 |
| Insomnia | 890 |
| The Institute | 576 |
| The Robber Bride | 528 |
| Life of Pi | 460 |
| Cell | 449 |
| Cujo | 432 |
| The Human Division (Old Man’s War, #5) | 431 |
| The Year of the Flood (MaddAddam, #2) | 431 |
