[This article was first published on R-posts.com, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Everyday, around 9 pm, I get fresh portion of the Netflix Top movies / TV shows. I’ve been doing this for more than two months and decided to show the first results answering following questions:

How many movies / TV shows make the Top?
What movie was the longest #1 on Netflix?
What movie was the longest in Tops?
For how many days movies / TV shows stay in Tops and as #1?
To have a try to plot all this up and down zigzags…

I took 60 days span (almost all harvest so far) and Top5 overall, not Top 10, in each category to talk about really the most popular and trendy.

So, lets count how many movies made the Top5, I mean it is definitely less than 5 *60…
library(tidyverse)
library (stats)
library (gt)
fjune <- read_csv("R/movies/fjune.csv")
#wrangle raw data - reverse (fresh date first), take top 5, take last 60 days
fjune_dt % rev () %>% slice (1:5) %>% select (1:60)

#gather it together and count the number of unique titles
fjune_dt_gathered <- gather (fjune_dt)
colnames (fjune_dt_gathered) <- c("date", "title")
unique_fjune_gathered <- unique (fjune_dt_gathered$title) str (unique_fjune_gathered) #chr [1:123] "Unknown Origins" "The Debt Collector 2" "Lucifer" "Casino Royale" OK, it is 123 out of 300. Now, it is good to have distribution in each #1 #2 #3 etc. t_fjune_dt <- as.data.frame (t(fjune_dt),stringsAsFactors = FALSE) # list of unique titles in each #1 #2 #3etc. for each day unique_fjune <- sapply (t_fjune_dt, unique) # number of unique titles in each #1 #2 #3 etc. n_unique_fjune <- sapply (unique_fjune, length) n_unique_fjune <- setNames (n_unique_fjune, c("Top1", "Top2", "Top3", "Top4", "Top5")) n_unique_fjune Top1 Top2 Top3 Top4 Top5 32 45 45 52 49 What movie was the longest in Tops / #1? # Top5 table_fjune_dt5 <- sort (table (fjune_dt_gathered$title), decreasing = T)
# Top1
table_fjune_dt1 <- sort (table (t_fjune_dt$V1), decreasing = T) # plotting the results bb5 <- barplot (table_fjune_dt5 [1:5], ylim=c(0,22), main = "Days in Top5", las = 1, col = 'light blue') text(bb5,table_fjune_dt5 [1:5] +1.2,labels=as.character(table_fjune_dt5 [1:5])) bb1 <- barplot (table_fjune_dt1 [1:5], main = "Days as Top1", las = 1, ylim=c(0,6), col = 'light green') text(bb1,table_fjune_dt1 [1:5] +0.5, labels=as.character(table_fjune_dt1 [1:5])) Lets weight and rank (1st = 5, 2nd = 4, 3rd = 3 etc.) Top 5 movies / TV shows during last 60 days. i<- (5:1) fjune_dt_gathered5 <- cbind (fjune_dt_gathered, i) fjune_dt_weighted % group_by(title) %>% summarise(sum = sum (i)) %>% arrange (desc(sum)) top_n (fjune_dt_weighted, 10) %>% gt () As we see the longer movies stays in Top the better rank they have with simple weighting meaning “lonely stars”, which got the most #1, draw less top attention through the time span. Average days in top av_fjune_dt5 <- round (nrow (fjune_dt_gathered) / length (unique_fjune_gathered),1) # in Top5 av_fjune_dt1 <- round (nrow (t_fjune_dt) / length (unique_fjune$V1),1) #as Top1

cat("Average days in top5: ", av_fjune_dt5, "\n")
cat("Average days as top1: ", av_fjune_dt1)
#Average days in top5: 2.4
#Average days as top1: 1.9
Average days in top distribution
as5 <- as.data.frame (table_fjune_dt5, stringsAsFActrors=FALSE)

as1 <- as.data.frame (table_fjune_dt1, stringsAsFActrors=FALSE)
par (mfcol = c(1,2))
boxplot (as5$Freq, ylim=c(0,7), main = "Days in Top5") boxplot (as1$Freq, ylim=c(0,7), main = "Days as Top1")

And now, let`s try to plot Top5 movies / TV shows daily changes.
I played with different number of days and picked 4 – with longer span I had less readable plots since every new day brings new comers into the Top with its own line, position in the legend, and, ideally its own color.

fjune_dt_gathered55 % slice (1:20) %>% rev ()
ggplot(fjune_dt_gathered55)+ aes (x=date, y=i, group = title) + geom_line (aes (color = title), size = 2) +
geom_point (aes (color = title), size = 5)+ theme_classic()+
theme(legend.position="right")+ylab("top")+ ylim ("5", "4", "3", "2", "1")+
geom_line(aes(colour = title), arrow = arrow(angle = 15, ends = "last", type = "closed"))+scale_color_brewer(palette="Paired")

Most popular on Netflix. Daily Tops for last 60 days was first posted on September 2, 2020 at 2:36 pm.
©2020 "R-posts.com". Use of this feed is for personal non-commercial use only. If you are not reading this article in your feed reader, then the site is guilty of copyright infringement. Please contact me at tal.galili@gmail.com

To leave a comment for the author, please follow the link and comment on their blog: R-posts.com.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)