The Bachelorette Ep. 2 – Petal to the Metal – Data and Drama in R

[This article was first published on Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Week two brought us some new Clare drama. We decided to “strip” the data down to its essentials and try to avoid “dodging” any tough questions.

In case you missed last week’s recap, you can find it here.

Since only one man was eliminated this week (for his failure to adequately compliment Clare), this week’s blog post focuses on the suitors’ Instagram accounts. On average, the men have roughly doubled their growth in instagram followers since the show began. They are well on their way to promoting FabFitFun boxes!

The Instagram star, Dale Moss, out of South Dakota has an insane number of followers. Dale began the show with about 180k followers and currently has 267k followers. It really makes one wonder if he’s here for the right reasons! 

Due to the fact that his current follower count is so large, we will be excluding his data from the rest of the analysis so that graphs show signs of life for the other contestants.

Rplot1.png

Now that the star from South Dakota has been removed, we can take a look at the increase in followers over time. As one would expect, every contestant gained followers after the show originally aired on October 13, 2020. However, some increased more than others. Initially, Kenny Braasch started out with the most followers but only gained a weak 4% (1.5K followers) since the original air date. On the other hand, Blake Moynes, moved from the bottom 25th percentile to second place due to an increase in followers of almost 800% (30.6K).

Rplot2.png

At this point, we can start to see the post-show outcome. In other words, we see the future “influencers” and the “losers”. In the scatterplot, we see that most people had less than 10K followers before the show aired. Which, to be fair, is nothing to sneeze at. The separation between “influencers” and “influencees“ can be seen along the y-axis as those who are gaining continue to rise, most notably, Blake, Ivan, Bennett, and Tyler (recently eliminated).

Rplot3.png

Earlier this week, we added a feature to our Bachelorette analytics dashboard (https://stoltzmaniac.shinyapps.io/TheBacheloretteApp/) that utilizes some fancy algorithms to determine the emotions of the faces in each post the contestant made public on Instagram. One example from the dashboard describe Ben Smith’s pics to be mostly happy and calm, with a side of sadness and fear!

Rplot4.png

We’ll be doing some analysis after next week’s show, hope to see you then. The code for the plots is below and the data is available upon request by using our contact page.

library(dplyr)
library(tidyr)
library(ggplot2)
library(lubridate)

GLOBAL_DATA = get_database_data() # rds available upon request (contact page)


GLOBAL_DATA$insta_followers_w_losers %>%
  filter(!suitor %in% c('chrisbharrison', 'tayshiaaa', 'clarecrawley', 'DaleMoss13')) %>%
  group_by(suitor) %>%
  mutate(min_date = min(datetime),
         max_date = max(datetime)) %>%
  filter(datetime == min_date | datetime == max_date) %>%
  arrange(datetime) %>%
  mutate(follower_change = follower_count - lag(follower_count),
         follower_change_pct = follower_change / lag(follower_count)) %>%
  drop_na() %>%
  left_join(GLOBAL_DATA$contestant_data_raw %>% select(name, end_episode), by = 'name') %>%
  mutate(status = if_else(end_episode <= GLOBAL_DATA$latest_episode, 'Eliminated', 'Active')) %>%
  ggplot(aes(x = follower_count, y = follower_change_pct, col = status)) + 
  #geom_abline(slope = 1, intercept = 0, linetype = 'dotted') +
  geom_point() +
  ggrepel::geom_text_repel(aes(label = name), show.legend = FALSE) + 
  #geom_smooth(method = 'lm', se = FALSE) + 
  theme_minimal() + 
  labs(x = 'Followers Before Air Date', y = 'Change in Followers', title = "Instagram Follower Gains", subtitle = "*Excludes Dale Moss") + 
  scale_color_manual("legend", values = c("Active" = "#ff6699", "Eliminated" = "#5a5a5a")) + 
  scale_fill_manual("legend", values = c("Active" = "#ff6699", "Eliminated" = "#5a5a5a")) + 
  scale_y_continuous(label = scales::percent) +
  scale_x_continuous(label = scales::comma) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5, vjust = -1),
        plot.subtitle = element_text( hjust = 0.5, vjust = -1))



GLOBAL_DATA$insta_followers%>%
  filter(!suitor %in% c('chrisbharrison', 'tayshiaaa', 'clarecrawley', 'DaleMoss13')) %>%
  drop_na() %>%
  ggplot(aes(x = datetime, y = follower_count, group = name)) + 
  geom_line(col = '#ff6699', fill = '#ff6699', size = 1.5) +
  gghighlight::gghighlight(name == 'Blake Moynes', label_key = name, use_group_by = TRUE) + 
  theme_minimal() + 
  labs(x = '', y = '', title = "Instagram Followers Over Time", subtitle = "*Excludes Dale Moss") + 
  scale_y_continuous(label = scales::comma) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5, vjust = -1),
        plot.subtitle = element_text( hjust = 0.5, vjust = -1))




GLOBAL_DATA$insta_followers %>%
  filter(!suitor %in% c('chrisbharrison', 'tayshiaaa', 'clarecrawley')) %>%
  drop_na() %>%
  group_by(suitor) %>%
  filter(datetime == max(datetime)) %>%
  ungroup() %>%
  ggplot(aes(x = reorder(name, follower_count), y = follower_count)) +
  geom_col(col = '#ff6699', fill = '#ff6699') + 
  gghighlight::gghighlight(name == 'Dale Moss', label_key = name, use_group_by = TRUE) + 
  coord_flip() +
  theme_minimal() +
  labs(x = '', y = '', title = "Current Instagram Followers") + 
  scale_y_continuous(label = scales::comma) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5))

To leave a comment for the author, please follow the link and comment on their blog: Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)