The Bachelorette Ep. 1 – Every has its Thorn – Data Analysis in R

[This article was first published on Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We all love a good love story. In The Bachelor TV Franchise, millions gather to see who will be the lucky winner of the contestant’s heart. But what about those who don’t make it? What do they have in common? Please check out our dashboard (built with shiny) at https://stoltzmaniac.shinyapps.io/TheBacheloretteApp/.

Night 1 of The Bachelorette found 23 men continuing on to night 2, while 8 men were sent packing. Based on our highly scientific rating system here at Stoltzman Consulting, we found that Clare has a preferred type. 

Let’s dig into the data.

Below, we see that Clare prefers men whose characteristics are Bro-y and Vanit-y while she is less interested in men who are there with characteristics of the “Right Reason-y” and “Kind Heart-y”. The “Drama-y” level does not appear to impact her decision making.

Screen Shot 2020-10-16 at 10.16.28 AM.png

Clare also appears to prefer men who have a stronger social media presence. Those who have not been cut are shown in the “tall and slender” side versus those who were eliminated are depicted in the “short and stout” side of the chart below. Dale Moss had to be excluded from the chart due to the fact that his follower count was way higher than everyone else’s (over 180K) – he is still an active participant.

Screen Shot 2020-10-16 at 11.29.42 AM.png

We are looking forward to seeing where this season goes. Apparently, it is the most dramatic season yet! Remember to checkout our live analytics dashboard at https://stoltzmaniac.shinyapps.io/TheBacheloretteApp/ to build models to predict the winner and see current Twitter and Instagram trends.

Screen Shot 2020-10-16 at 11.37.10 AM.png

Screen Shot 2020-10-16 at 11.38.12 AM.png

Screen Shot 2020-10-16 at 11.38.43 AM.png

Screen Shot 2020-10-16 at 11.39.12 AM.png

For those interested in the code (data available upon request, just visit the contact us section of this site).

library(tidyverse)

GLOBAL_DATA = get_database_data()
saveRDS(GLOBAL_DATA, 'GLOBAL_DATA.rds')


# Contestant circular bar chart
all_contestant_data = GLOBAL_DATA$contestant_data_raw

suitor_data = all_contestant_data %>% 
  filter(!instagram %in% c('tayshiaaa', 'chrisbharrison', 'clarecrawley')) %>%
  mutate(status = as.factor(
    case_when(
      end_episode > latest_episode ~ 'Active',
      TRUE ~ 'Eliminated'
    )
  )) %>%
  select(status, everything())
  
coord_plot_data = suitor_data %>%
  select(status:`Right Reason-y`) %>%
  group_by(status) %>%
  pivot_longer(`Vanit-y`:`Right Reason-y`, names_to = "characteristic") %>%
  group_by(characteristic, status) %>%
  summarize(avg = mean(value), .groups = 'drop')

coord_plot_data %>%
  ggplot(aes(x = characteristic)) + 
  geom_col(aes( y = avg, col = status, fill = status), position = 'dodge') + 
  geom_text(aes( y = avg, label = characteristic), data = coord_plot_data %>% filter(status == 'Active'), size = 4, position = position_stack(vjust = 1.4)) + 
  coord_polar() + 
  ggtitle('Preferred Suitor Characteristics') + 
  theme_minimal() + 
  scale_color_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + 
  scale_fill_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + 
  theme(legend.position = c(0.5, 0.95), legend.direction = "horizontal", legend.title = element_blank(), axis.title = element_blank(), 
        axis.text.y = element_blank(), axis.text.x = element_blank(), plot.title = element_text( hjust = 0.5, vjust = -1))


# Contestant instagram

insta_follower_data = GLOBAL_DATA$insta_followers_w_losers %>%
  drop_na() %>%
  mutate(relative_air_date = 
           case_when(
             datetime <= '2020-10-12' ~ 'Pre Air',
             TRUE ~ 'Aired')
         ) %>%
  left_join(
    suitor_data,
    by = 'name'
  ) %>%
  drop_na()

insta_follower_data %>%
  filter(relative_air_date == 'Pre Air', follower_count < 1e5) %>%
  ggplot(aes(x = status, y = follower_count, col = status, fill = status)) + 
  geom_violin(alpha = 0.) + 
  ggtitle('Pre-Air Instagram Followers', subtitle = "*Excludes Dale Moss") + 
  theme_minimal() + 
  scale_color_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + 
  scale_fill_manual("legend", values = c("Active" = "pink", "Eliminated" = "darkgrey")) + 
  scale_y_continuous(label = scales::comma) +
  theme(legend.position = 'top', legend.direction = "horizontal", legend.title = element_blank(), axis.title = element_blank(), 
        plot.title = element_text( hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))  

  

To leave a comment for the author, please follow the link and comment on their blog: Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)