Simple Interactive Visualization of ’16-17 NBA Stats With Shiny

[This article was first published on R – NYC Data Science Academy Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Purpose

Why I Chose This Project

Fantasy sports is a multi-billion-dollar industry.  Most players, including myself, just play because it’s an effective way to keep in touch with long-distance friends and the friendly competition banter is always good fun.  We don’t have enough hours in the day to stay current with all the updates and the latest news on Sportscenter, so I thought it would be helpful and interesting to set up a Shiny App that compiles in-depth stats on NBA players.

Questions to Answer

I set out to answer which players had the most values in multiple contexts.  I wanted to discover some of the mid-tier players in the ’16-’17 season who stood out in certain statistical categories to keep on my radar for next season.

Process

Where and How I Extracted the Data

To acquire the data needed, I first had to search the web and figure out what would be the most efficient way to extract the stats I needed.  There are numerous websites that display NBA stats; however, most of them don’t make it simple to download to a .csv or .txt file. Eventually, I discovered http://www.dougstats.com/16-17.html which breaks down the stats in a simple raw downloadable .txt file.  The .txt files still needed a fair amount of data munging to tidy up for usage within the Shiny App.

I also extracted more advanced stats from https://rotogrinders.com/pages/nba-advanced-player-stats-guards-181885, which includes tables for the guard, forward, and center positions in the NBA.  Since the advanced stats were in tables already, the most efficient way to extract the data was to simply use pandas read_html function.  Unfortunately, I didn’t realize this at first and used beautifulsoup to scrape the page instead. It all worked out in the end, though. Because some tables using pandas were missing the column names, I just used the BS code to extract them, and combined them with the pandas dataframe. I then exported it as a .csv file.

View the code on Gist.

How I analyzed the data

Before I set up my Shiny App, I wanted to answer the following questions using dplyr:

#1. What is the correlation between minutes and primary stats?

Unsurprisingly there is a strong correlation between how many minutes are played and the primary statistics such as points, assists, rebounds, etc.

You can find the remainder of this answer in my GitHub repo.

#2. What is the mean and SD of primary stats. Per team? Per position?

You can find the remainder of this answer in my GitHub repo.

#3. Which players have the best PER’s in the league per team? per position? per position on team?

#4. What is each team’s total PER?

#5. What are the central tendencies of primary stats? Per team? Per Position?

You can find the remainder of this answer in my GitHub repo.

#6. Who were the league leaders in primary stats?

#7. What was each team’s winning percentage?

#8.What is the correlation between team PER and team win ratio?

How I Visualized The Data

I created the app to display multiple tables that can be easily searched via whatever keyword the user finds applicable (ie, player name, team name, position, stats category).  The app also displays an interactive visualization of the correlation between team PER (player efficiency rating) and team winning percentage.

Results

Insights Gleaned

There doesn’t appear to be a clear correlation between team PER and team winning percentage.  There were a few players when doing my analysis that appears to have solid value adds in the later rounds of a draft because they excel at one or a few important statistical categories but are overlooked because their minutes per game are lower playing behind star players on the roster.  The analysis helped me to efficiently narrow down who these players are so I can see what happens to their career next season.  Sometimes a trade to a new team can significantly increase a mid-tier players value if they were already showing promise behind the shadow of another player from their previous team.

Improvement to be Made

This was a small-scale project that was done in the offseason.  I only acquired data from ’16-’17 season. It can be helpful to glean insights from the data from the previous season, but many variables need to be considered outside of the previous year’s statistical analysis.  In the future, I may figure out how to extract and import data in a more frequent manner during the season since it’s helpful to glean insights quickly and efficiently while the season is taking place.

You can view the project here: https://kiwibp.shinyapps.io/nbashiny/ and my code on github here: https://github.com/Kiwibp/NYC-DSA-Bootcamp–R-Shiny-App.  It was interesting to learn R and Shiny but I have no intention or desire to use them again in the short term.  You can view other interactive data visualizations I’ve created with Tableau here: https://public.tableau.com/profile/keenan.burke.pitts.

 

To leave a comment for the author, please follow the link and comment on their blog: R – NYC Data Science Academy Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)