The Bachelorette Eps. 4 & 5 – Influencers in the Garden – Data and Drama in R

[This article was first published on Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The level of drama in The Bachelorette was so high that I had to wait 2 episodes to gather my thoughts. We have now seen a shift from Clare to Tayshia and Instagram followers have jumped on board the Tayshia train. However, they have not come close to abandoning Clare. It seems hard to believe that people could follow both of them simultaneously because one person can only handle so much drama.

If we look at the follower gap between the two Bachelorettes, it has decreased substantially but is likely to expand again after Clare fades into the background and Tayshia becomes the star of the show.

blog1.png

I asked myself, what were the top hashtags used with #TheBachelorette on Twitter each day the show aired? The answer: SPAM.

blog3.png

Or so I thought…

It seemed odd that people would be tweeting about the US presidential election instead of The Bachelorette. So, I did some digging. I recalled seeing vote count charts that looked very similar to the patterns we saw between the number of Instagram followers of Tayshia and Clare. If we look at vote share in Nevada between Trump and Biden, we notice a striking similarity between follower count of Tayshia and Clare.

Here’s the chart from the NY Times:

Screen Shot 2020-11-12 at 8.11.05 PM.png

It has an uncanny resemblance to the plot of the bachelorette Instagram followers:

plot_combo.png

We can see that Tayshia started with over 57% of the total followers but that gap narrowed down: 53% (T) to 47% (C). If this chart seems ridiculous, it’s because it is.

Looking forward to seeing the addition of the new guys to the predictive model.

Side note: Chris Harrison is way cooler than Dale Moss.

blog2.png

As always, please feel free to play with the data yourself at https://stoltzmaniac.shinyapps.io/TheBacheloretteApp/ where you can take advantage of some fancy algorithms to determine the emotions of the faces in each post the contestant made public on Instagram.

We’ll be doing some analysis after next week’s show, hope to see you then. The code for the plots is below and the data is available upon request by using our contact page.

library(dplyr)
library(lubridate)
library(ggplot2)
library(ggridges)

GLOBAL_DATA = get_database_data()


GLOBAL_DATA$insta_followers %>%
  filter(suitor %in% c('clarecrawley', 'tayshiaaa'),
         datetime <= '2020-11-12') %>%
  group_by(name, datetime) %>%
  summarize(follower_count = mean(follower_count), .groups = 'drop') %>%
  ggplot(aes(x = datetime, y = follower_count, col = name)) + 
  geom_line() + 
  geom_line(col = '#ff6699', fill = '#ff6699', size = 1.5) +
  gghighlight::gghighlight(name == 'Tayshia Adams', label_key = name, use_group_by = TRUE) + 
  theme_minimal() +
  labs(x = '', y = '', title = "Bachelorette Instagram Followers Over Time", subtitle = "Tayshia vs. Clare") + 
  scale_y_continuous(label = scales::comma) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5, vjust = -1),
        plot.subtitle = element_text( hjust = 0.5, vjust = -1))


dat = GLOBAL_DATA$insta_followers %>%
  filter(suitor %in% c('clarecrawley', 'tayshiaaa'),
         datetime <= '2020-11-12') %>%
  group_by(name, datetime) %>%
  summarize(follower_count = mean(follower_count), .groups = 'drop') %>%
  pivot_wider(id_cols = datetime, names_from = name, values_from = follower_count) %>%
  fill(`Tayshia Adams`) %>%
  mutate(margin = `Tayshia Adams` - `Clare Crawley`,
         total = `Tayshia Adams` + `Clare Crawley`,
         pct_clare = `Clare Crawley` / total,
         pct_tayshia = `Tayshia Adams` / total) 


p1 = dat %>%
  select(datetime, `Clare Crawley` = pct_clare, `Tayshia Adams` = pct_tayshia) %>%
  pivot_longer(cols = c(`Clare Crawley`, `Tayshia Adams`), "Bachelorette", values_to = 'pct_of_total_followers') %>%
  ggplot(aes(x = datetime, y = pct_of_total_followers, col = Bachelorette)) +
  geom_line(size = 1.5) +
  theme_minimal() + 
  labs(x = '', y = '', title = 'Percentage of Total Follower Count') +
  scale_y_continuous(label = scales::percent) + 
  scale_color_manual(values = c('red', 'blue')) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5, vjust = -1),
        plot.subtitle = element_text( hjust = 0.5, vjust = -1))


p2 = dat %>%
  ggplot(aes(x = datetime, y = margin)) +
  geom_line(size = 1.5, color = '#ff6699') + 
  theme_minimal() + 
  labs(x = '', y = '', title = 'Follower Count Difference', subtitle = 'Difference = Tayshia - Clare') +
  scale_y_continuous(label = scales::comma) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5, vjust = -1),
        plot.subtitle = element_text( hjust = 0.5, vjust = -1))


  gridExtra::grid.arrange(p1, p2)
  
  ggplot(aes(x = datetime, y = follower_count, col = name)) + 
  geom_line() + 
  geom_line(col = '#ff6699', fill = '#ff6699', size = 1.5) +
  gghighlight::gghighlight(name == 'Tayshia Adams', label_key = name, use_group_by = TRUE) + 
  theme_minimal() +
  labs(x = '', y = '', title = "Bachelorette Instagram Followers Over Time", subtitle = "Tayshia vs. Clare") + 
  scale_y_continuous(label = scales::comma) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5, vjust = -1),
        plot.subtitle = element_text( hjust = 0.5, vjust = -1))


GLOBAL_DATA$insta_followers_w_losers %>%
  filter(suitor %in% c('chrisbharrison', 'DaleMoss13'),
         datetime <= '2020-11-12') %>%
  group_by(name, datetime) %>%
  summarize(follower_count = mean(follower_count), .groups = 'drop') %>%
  ggplot(aes(x = datetime, y = follower_count, col = name)) + 
  geom_line() + 
  geom_line(col = '#ff6699', fill = '#ff6699', size = 1.5) +
  gghighlight::gghighlight(name == 'Chris Harrison', label_key = name, use_group_by = TRUE) + 
  theme_minimal() +
  labs(x = '', y = '', title = "Bachelorette Instagram Followers Over Time", subtitle = "Chris Harrison vs. Dale Moss") + 
  scale_y_continuous(label = scales::comma) + 
  theme(legend.position = 'top', legend.direction = "horizontal", 
        legend.title = element_blank(), 
        plot.title = element_text( hjust = 0.5, vjust = -1),
        plot.subtitle = element_text( hjust = 0.5, vjust = -1))

To leave a comment for the author, please follow the link and comment on their blog: Stoltzman Consulting Data Analytics Blog - Stoltzman Consulting.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)