NBA’s All-Time Scoring Leaders Bar Chart Race Using R

[This article was first published on r-bloggers – Lakers Box Score Breakdown, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Kareem Abdul-Jabbar has sat atop the NBA’s leaderboard of career regular season scoring since taking the top spot from Wilt Chamberlain in 1984. LeBron James, who currently sits at #3, is the only active player currently in the top 10, and likely needs three more healthy seasons to surpass Kareem.

Bar chart races have become a somewhat controversial data visualization, with detractors decrying them as information overload. But one thing the haters can’t deny is that these charts are attention-grabbing, even captivating. Here’s how to make one using R.

The data needed to create the bar chart race can be found in this Google Sheet. Start by loading the necessary packages and reading in the data (I am using a csv saved locally with the same data that’s in the Google Sheet referenced above).

library(dplyr)
library(ggplot2)
library(gganimate)

chart_data <- readr::read_csv("yearly_totals.csv")

The dplyr and ggplot2 packages should be familiar to most R users. The third package, gganimate, is what is used to stitch together several static plots created with ggplot2 and turn them into an animated plot. Let’s start with how to create each individual static plot.

Creating a Static Plot

I’ll walk through a few intermediate steps before showing the more polished version of the chart to demonstrate how ggplot allows you to build plots iteratively. We can start by filtering for just one year of data and plotting the top 10 scorers. That can be accomplished using the code below:

chart_data %>%
  filter(YearEnd == 2020) %>%
  ggplot(aes(x = -Rank, y = CareerPts, fill = Player)) + 
    geom_tile(aes(y = CareerPts / 2, height = CareerPts), 
              width = 0.9) + 
    coord_flip()

This basic plot uses geom_tile rather than geom_bar, which works better with the animation we will eventually be using. The way geom_tile works is that you specify the center of the tile (i.e. the midpoint of the rectangle, which is the height divided by two, hence CareerPts / 2) as well as the width and height. The call to coord_flip gives us horizontal bars rather than vertical bars (also note the x and y-axes are now flipped). The reason for specifying -Rank as the x aesthetic mapping is so that we get the top-ranking player at the top of the chart rather than the bottom.

Add Plot Labels

Next we’ll add the labels for the player names and point totals onto the bars. That can be accomplished with the following code:

  chart_data %>%
  filter(YearEnd == 2020) %>%
  ggplot(aes(x = -Rank, y = CareerPts, fill = Player)) + 
    geom_tile(aes(y = CareerPts / 2, height = CareerPts), 
              width = 0.9) + 
    coord_flip() +
    # Add player labels to bars
    geom_text(aes(label = Player), col = "white", 
              hjust = "right", nudge_y = -1000) +
    # Add point totals next to bars
    geom_text(aes(label = scales::comma(CareerPts, accuracy = 1)), 
              hjust = "left", nudge_y = 1000)

The code above adds two calls to geom_text, the first of which adds the player labels in white, with the latter adding the point totals. These both inherit their x and y aesthetics from the original call to ggplot, which sets their position at the tip of the bars. The hjust argument makes the player labels right-justified and the point labels left-justified. The nudge_y argument offsets the player labels -1000 along the y-axis (remember our coordinates are flipped, so this is now a horizontal shift), and the point labels +1000. The call to scales::comma is for formatting the points labels.

Final Formatting

  chart_data %>%
  filter(YearEnd == 2020) %>%
  ggplot(aes(x = -Rank, y = CareerPts, fill = Player)) + 
    geom_tile(aes(y = CareerPts / 2, height = CareerPts), 
              width = 0.9) + 
    geom_text(aes(label = Player), col = "white", 
              hjust = "right", nudge_y = -1000) +
    geom_text(aes(label = scales::comma(CareerPts, accuracy = 1)), 
              hjust = "left", nudge_y = 1000) +
    # Final formatting
    coord_flip(clip = "off", expand = FALSE) +
    ylab("Career Points") + 
    ggtitle("NBA All-Time Scoring Leaders") + 
    scale_x_discrete("") +
    scale_y_continuous(limits = c(-4000, 49000), 
                       labels = scales::comma) +
    theme_minimal() +   
    theme(plot.title = element_text(hjust = 0.5, size = 20),
          legend.position = "none",
          panel.grid.minor = element_line(linetype = "dashed"), 
          panel.grid.major = element_line(linetype = "dashed"))

For the final formatting steps, we add the clip = "off" argument to coord_flip, which prevents the point labels from getting cut off as in the previous chart. The expand = FALSE argument prevents the chart from expanding beyond the specified x and y-limits. A title is added along with axis labels, with the x-axis (vertical) being set to blank with scale_x_discrete. The y-axis limits are set using scale_y_continuous and labels are given some nicer formatting using scales::comma. The final touches are added with theme_minimal, which removes the gray chart background, and additional theme elements to center the plot title, remove the legend, and use dashed gridlines.

Create Multiple Plots

Now that we have one polished plot created, we need to reproduce that across several years. You can create a visual of this across a few years using facet_wrap.

chart_data %>%
  filter(YearEnd >= 2018) %>%
  ggplot(aes(x = -Rank, y = CareerPts, fill = Player)) + 
...
 + facet_wrap(~YearEnd) 

Updating the filter(YearEnd == 2020) in the previous code to YearEnd >= 2018 and adding + facet_wrap(~YearEnd) to the end of that same code produces the following:

You can see that the only difference since 2018 is LeBron James moving from #7 in 2018 to #4 in 2019 and #3 in 2020. These plots are the building blocks for the animation. Once these are all set up, it’s time to bring in the gganimate functions.

Add Animation

Now we want to stitch together the plots created in the previous section and animate them using gganimate. We replace the facet_wrap function with transition_time(YearEnd). Let’s also update the filter to go back to 2010 to see how this works across a short but meaningful period of time.

chart_data %>%
  filter(YearEnd >= 2010) %>%
  ggplot(aes(x = -Rank, y = CareerPts, fill = Player)) + 
...
 + transition_time(YearEnd) +
  labs(subtitle = "Top 10 Scorers as of {round(frame_time, 0)}") + 
  theme(plot.subtitle = element_text(hjust = 0.5, size = 12))

The resulting animation should show Kobe Bryant, Dirk Nowitzki, and LeBron James moving up the rankings. A subtitle is also added, which references the frame_time, a handy property that you can access when using gganimate (try it without the round function wrapped to see how gganimate iterates through individual frames).

Putting it all together

If everything has worked up to this point, the final steps are to use the full data set, and set some animation parameters so that you can save it in a nice format.

anim <- chart_data %>%
  # Comment out the filter
  # filter(YearEnd >= 2010) %>%
  ggplot(aes(x = -Rank, y = CareerPts, fill = Player)) + 
...
 + transition_time(YearEnd) +
  labs(subtitle = "Top 10 Scorers as of {round(frame_time, 0)}") + 
  theme(plot.subtitle = element_text(hjust = 0.5, size = 12))

animate(anim, renderer = gifski_renderer(),
        end_pause = 50, 
        nframes = 5*(2020-1950), fps = 10,
        width = 1080, height = 720, res = 150)

anim_save("NBA_Leading_Scorers.gif")

The animate function allows you to specify the details about the animation. The default renderer is the gifski_renderer, but you can also choose others like av_renderer or ffmpeg_renderer if you wanted to save a video instead of a gif. The end_pause parameter lets you have a nice pause at the end of the animation so that the gif doesn’t cycle back to the beginning right away. You set the number of frames and frames per second with nframes and fps respectively (you may need to tweak these arguments depending on how fast or slow you want the animation). The width, height, and res arguments let you specify device dimensions and resolution, which will determine the size and resolution of the gif in this case. Finally, the call to anim_save is how you save the animation to a file.

One footnote: I also had a mapping of team colors to make the color scheme a little more meaningful, which I declined to include in this walkthrough (that’s why the colors are different in the gif at the beginning of this post). When all’s said and done, you should have something like this:

Data for these charts was from basketball-reference.com. This is hopefully my first of many posts for R-bloggers.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers – Lakers Box Score Breakdown.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)