# Statistics Sunday: My 2019 Reading

**Deeply Trivial**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I’d bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year.

library(tidyverse) ## -- Attaching packages ------------------------------------------- tidyverse 1.3.0 -- ##ggplot2 3.2.1purrr 0.3.3 ##tibble 2.1.3dplyr 0.8.3 ##tidyr 1.0.0stringr 1.4.0 ##readr 1.3.1forcats 0.4.0 ## -- Conflicts ---------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() reads2019 <- read_csv("~/Downloads/Blogging A to Z/SaraReads2019_allchanges.csv", col_names = TRUE) ## Parsed with column specification: ## cols( ## Title = col_character(), ## Pages = col_double(), ## date_started = col_character(), ## date_read = col_character(), ## Book.ID = col_double(), ## Author = col_character(), ## AdditionalAuthors = col_character(), ## AverageRating = col_double(), ## OriginalPublicationYear = col_double(), ## read_time = col_double(), ## MyRating = col_double(), ## Gender = col_double(), ## Fiction = col_double(), ## Childrens = col_double(), ## Fantasy = col_double(), ## SciFi = col_double(), ## Mystery = col_double(), ## SelfHelp = col_double() ## )

As you recall, I read 87 books last year, by 42 different authors.

reads2019 %>% summarise(Books = n(), Authors = n_distinct(Author)) ## # A tibble: 1 x 2 ## Books Authors #### 1 87 42

Using summarise, we can get some basic information about each author.

authors <- reads2019 %>% group_by(Author) %>% summarise(Books = n(), Pages = sum(Pages), AvgRating = mean(MyRating), Oldest = min(OriginalPublicationYear), Newest = max(OriginalPublicationYear), AvgRT = mean(read_time), Gender = first(Gender), Fiction = sum(Fiction), Childrens = sum(Childrens), Fantasy = sum(Fantasy), Sci = sum(SciFi), Mystery = sum(Mystery))

Let's plot number of books by each author, with the bars arranged by number of books.

authors %>% ggplot(aes(reorder(Author, desc(Books)), Books)) + geom_col() + theme(axis.text.x = element_text(angle = 90)) + xlab("Author")

I could simplify this chart quite a bit by only showing authors with 2 or more books in the set, and also by flipping the axes so author can be read along the side.

authors %>% mutate(Author = fct_reorder(Author, desc(Author))) %>% filter(Books > 1) %>% ggplot(aes(reorder(Author, Books), Books)) + geom_col() + coord_flip() + xlab("Author")