[This article was first published on Deeply Trivial, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’ve spent the month of April blogging my way through the tidyverse, while using my reading dataset from 2019 as the example. Today, I thought I’d bring many of those analyses and data manipulation techniques together to do a post about my reading habits for the year.
Based on this data, I read the most books by L. Frank Baum (which makes sense, because I made a goal to reread all 14 Oz series books), followed by Terry Pratchett (which makes sense, because I love him). The code above is slightly more complex, because when I use coord_flip(), the author names were displayed in reverse alphabetical order. Using the factor reorder code plus the reorder in ggplot allowed me to display the chart in order by number of books then by author alphabetical order.
We can also plot average rating by author, which can tell me a little more about how much I like particular authors. Let's plot those for any author who contributed at least 2 books to my dataset.
I only read 2 books by Ann Patchett, but I rated both of her books as 5, giving her the highest average rating. If I look at one of the authors who contributed more than 2 books, John Scalzi (tied for 3rd most read in 2019) has the highest rating, followed by Terry Pratchett (2nd most read). Obviously, though, I really like any of the authors I read at least 2 books from, because they all have fairly high average ratings. Stephen King is the only one with an average below 4, and that's only because I read Cujo, which I hated (more on that later on in this post).
We can also look at how genre affected ratings. Using the genre labels I generated before, let's plot average rating.
Based on this plot, my favorite genres appear to be fantasy, sci-fi, and especially books with elements of both. No surprises here.
Let's dig into ratings on individual books. In my filter post, I identified the 25 books I liked the most (i.e., gave them a 5-star rating). What about the books I disliked? The lowest rating I gave was a 2, but it's safe to say I hated those books. And I also probably didn't like the books I rated as 3.
lowratings <- reads2019 %>%
filter(MyRating <= 3) %>%
mutate(Rating = case_when(MyRating == 2 ~ "Hated",
MyRating == 3 ~ "Disliked")) %>%
arrange(desc(MyRating), Author) %>%
select(Title, Author, Rating)
## Attaching package: 'expss'
## The following objects are masked from 'package:stringr':
## fixed, regex
## The following objects are masked from 'package:dplyr':
## between, compute, contains, first, last, na_if, recode, vars
## The following objects are masked from 'package:purrr':
## keep, modify, modify_if, transpose
## The following objects are masked from 'package:tidyr':
## contains, nest
## The following object is masked from 'package:ggplot2':
as.etable(lowratings, rownames_as_row_labels = FALSE)
The Scarecrow of Oz (Oz, #9)
Baum, L. Frank
The Tin Woodman of Oz (Oz, #12)
Baum, L. Frank
The 5 Love Languages: The Secret to Love That Lasts
Boundaries: When to Say Yes, How to Say No to Take Control of Your Life
Collins, David Jay
When We Were Orphans
Bird Box (Bird Box, #1)
Oz in Perspective: Magic and Myth in the L. Frank Baum Books
Just Evil (Evil Secrets Trilogy, #1)
I'm a little surprised at some of this, because several books I rated as 3 I liked and only a few I legitimately didn't like. The 2 books I rated as 2 I really did hate, and probably should have rated as 1 instead. So based on my new understanding of how I've been using (misusing) those ratings, I'd probably update 3 ratings.
There! Now I have a much more accurate representation of the books I actually disliked/hated, and know how I should be rating books going forward to better reflect how I think of the categories. Of the two I hated, Just Evil... was an e-book I won in a Goodreads giveaway that I read on my phone when I didn't have a physical book with me: convoluted storyline, problematic romantic relationships, and a main character who talked about how much her dog was her baby, and yet the dog was forgotten half the time (even left alone for long periods of time while she was off having her problematic relationship) except when the dog's reaction or protection became important to the storyline. The other, Cujo, I reviewed here; while I'm glad I read it, I have no desire to ever read it again.
Let's look again at my top books, but this time, classify them by long genre descriptions from above. I can get that information into my full reading dataset with a join, using the genre flags. Then I can plot the results from that dataset without having to summarize first.
This chart helps me to better understand my average rating by genre chart above. Only 1 book with elements of both fantasy and sci-fi was rated as a 5, and the average rating is 4.5, meaning there's only 1 other book in that category that had to be rated as a 4. It might be a good idea to either filter my genre rating table to categories with more than 1 book, or add the counts as labels to that plot. Let's try the latter.