A great way to hone your skills as a data scientist is to pick a topic you're passionate about, find some data related to it, and analyze the heck out of it. Jim Vallandingham is clearly passionate about old Kung Fu movies — particularly those from the Shaw Brothers Studio — and has used R to analyze data the studio's oeuvre: 260 films over 22 years.
The complete R code behind the analysis is included in the post (and you can also find it as an R Markdown document here). Some interesting notes include:
- Use of the tidyjson package to parse the table scraped from the list of Shaw Brothers Martial Arts Films on Letterboxd.
- Many applications of the ggplot2 package to visualize directors, actors, film length, Letterboxd watches and likes, and other data about the films.
- Using the tidytext package to find the common words used in film titles.
- Using the igraph package to create a network digram showing actors who appear in films together, shown below.
Jim Vallandingham: A Data Driven Exploration of Kung Fu Films