Statistics Sunday: My 2019 Reading
[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I’d bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
library(tidyverse) ## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 -- ## <U+2713> ggplot2 3.2.1 <U+2713> purrr 0.3.3 ## <U+2713> tibble 2.1.3 <U+2713> dplyr 0.8.3 ## <U+2713> tidyr 1.0.0 <U+2713> stringr 1.4.0 ## <U+2713> readr 1.3.1 <U+2713> forcats 0.4.0 ## -- Conflicts ---------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv", col_names = TRUE) ## Parsed with column specification: ## cols( ## Title = col_character(), ## Pages = col_double(), ## date_started = col_character(), ## date_read = col_character(), ## Book.ID = col_double(), ## Author = col_character(), ## AdditionalAuthors = col_character(), ## AverageRating = col_double(), ## OriginalPublicationYear = col_double(), ## read_time = col_double(), ## MyRating = col_double(), ## Gender = col_double(), ## Fiction = col_double(), ## Childrens = col_double(), ## Fantasy = col_double(), ## SciFi = col_double(), ## Mystery = col_double(), ## SelfHelp = col_double() ## )As you recall, I read 87 books last year, by 42 different authors.
reads2019 %>% summarise(Books = n(), Authors = n_distinct(Author)) ## # A tibble: 1 x 2 ## Books Authors ## <int> <int> ## 1 87 42Using summarise, we can get some basic information about each author.
authors <- reads2019 %>% group_by(Author) %>% summarise(Books = n(), Pages = sum(Pages), AvgRating = mean(MyRating), Oldest = min(OriginalPublicationYear), Newest = max(OriginalPublicationYear), AvgRT = mean(read_time), Gender = first(Gender), Fiction = sum(Fiction), Childrens = sum(Childrens), Fantasy = sum(Fantasy), Sci = sum(SciFi), Mystery = sum(Mystery))Let's plot number of books by each author, with the bars arranged by number of books.
authors %>% ggplot(aes(reorder(Author, desc(Books)), Books)) + geom_col() + theme(axis.text.x = element_text(angle = 90)) + xlab("Author")
authors %>% mutate(Author = fct_reorder(Author, desc(Author))) %>% filter(Books > 1) %>% ggplot(aes(reorder(Author, Books), Books)) + geom_col() + coord_flip() + xlab("Author")