Mapping NBA Shot Locations

June 28, 2019
By

(This article was first published on schochastics, and kindly contributed to R-bloggers)

I recently came across the article “How Mapping Shots In The NBA Changed It Forever” and although I am not a big basketball fan, I was impressed by the visualizations. I actually bought the book “Sprawlball” by Kirk Goldsberry afterwards, where this was taken from. I can only recommend it, even if you are not a huge basketball fan since it contains a lot of inspirational dataviz material.

In this post, I am going to try to replicate the first three visualizations of the linked article, which turned out to be harder than expected! Additionally, I will use the rayshader package to turn one of the figures into a
nice 3D plot.

library(tidyverse)
library(showtext)
library(hexbin)

Data

When I decided to do this replication, I was pretty sure that it is doomed to failure since I will
not get the right data (precise shot locations). Turns out, stats.nba.com has an incredible rich (but undocumented)
API. I will not go into detail on how I got all the relevant data, but if you want to get it yourself,
check out toddwschneider’s R package ballr, specifically this function.

I took several hours, but at the end I managed to put together a dataset of all shots in the NBA between
2006 and 2019. In the following, we will only need 2013-2018 though.

Scatterplot of shot locations

The first figure in the article is the easiest to reproduce, since it is a simple
scatterplot. The only thing that took a bit of time was to find a suitable font. In the end, I settled
with Rokkitt from Google, which can be loaded via the showtext package as follows.

font_add_google("Rokkitt", "rokkitt")
showtext_auto()

A second issue was to get the court in the background. The ballr package luckily has a plot_court function
which I tweaked slightly for my purposes.

col_scheme <- c("firebrick4","darkorange2","gold1",
                "chartreuse4","cadetblue2","darkorchid4")

p <- plot_court()+
  geom_point(data = scatter_shots,
             aes(loc_x,loc_y,col = shot_distance),size = 0.1)+
  scale_color_gradientn(colours = col_scheme,breaks = c(0,30),limits = c(0,30),
                        name = "SHOT DISTANCE IN FEET")+
  guides(colour = guide_colourbar(barwidth = 11, barheight = 0.8,ticks = F,
                                  title.position = "top",title.hjust = 0.5))+
  labs(title = "NBA SHOT LOCATIONS\n2014-15")+
  theme(plot.title = element_text(hjust=0.5,size = 32,colour="grey45",
                                  family="rokkitt",face = "bold"),
        legend.position = "bottom",legend.justification = c(0.88,0),
        legend.text = element_text(size=14,colour = "grey45",family = "rokkitt"),
        legend.title = element_text(size=13,colour = "grey45",family = "rokkitt"),
        legend.box.margin = margin(c(-50,0,3,0)),
        plot.margin = margin(3,3,3,3, unit = "pt"))

ggsave("shots2014-15.png",plot = p ,width = 10, height = 8, dpi = 100)

Looks pretty close to the original. Although it was quite straightforward to reproduce this figure, I still learned two things: How to tweak a colourbar legend with guide_colourbar and how to use the legend.justification argument of theme.

Field goal percentages

Reproducing the second figure of the article is way more involved than the first (at least it was for me).
The figure has a lot of very smooth areas and I would assume that it is more of an illustration than an accurate
depiction of the data (which is of course totally fine in this context!). I tried out several things, and
in the end I settled for a combination of “hexbin aggregation” and interpolating the data on a regular grid.

The first step was to aggregate shots in roughly the same location into cells and calculate the field goal percentages
for each cell. I did so using the hexbin function of the package with the same name.

tbl <- vroom::vroom(c("shots2013-14.csv","shots2014-15.csv","shots2015-16.csv",
                      "shots2016-17.csv","shots2017-18.csv")) %>% 
  select(loc_x,loc_y,shot_made_numeric,shot_value,shot_distance)

bns <- 20 
hb <- hexbin(x = tbl$loc_x,y = tbl$loc_y,xbins = bns,IDs = TRUE)

shot_perc <- tbl %>% 
  mutate(hex = [email protected]) %>% 
  group_by(hex) %>% 
  dplyr::summarise(perc = sum(shot_made_numeric)/n())
shot_perc <- left_join(shot_perc,tibble(hex = [email protected],x = [email protected],y = [email protected]),by = "hex")

I played around with the number of bins and found that 20 seemed to be a good trade-off.
Afterwards, I created a regular grid with 1000 points per dimension and interpolated the
shot percentages with the interp function from the akima package.

npts <- 1000
xyz_interp <- akima::interp(x = shot_perc$x,y = shot_perc$y,
                            z = shot_perc$perc,nx = npts,ny = npts,linear = T)

shot_perc_plot <- tibble(x = rep(xyz_interp[[1]],npts),
                         y = rep(xyz_interp[[2]],each=npts),
                         z = c(xyz_interp[[3]])) %>% 
  dplyr::filter(!is.na(z) & y >= -2) %>% 
  mutate(z=if_else(y < 0 & z <= 0.35,0.36,z)) %>% 
  mutate(z=if_else(y < 0 & z >= 0.45 & x > 5,0.45,z)) %>% 
  mutate(zfill=case_when(z <= 0.35~0.35,
                     z <= 0.4~0.40,
                     z <= 0.45~0.45,
                     z <= 0.50~0.50,
                     z <= 0.55~0.55,
                     z <= 0.60~0.6,
                     TRUE~0.65)) %>% 
  mutate(zfill = as.factor(zfill)) %>% 
  dplyr::filter(sqrt(x^2+(y-hoop_center_y)^2)<=27)  

shot_perc_plot now contains (interpolated) field goal percentages on a regular grid
for the whole court. Visualizing this is now relatively straightforward with a combination
of geom_tile, geom_contour and some annotation sugar.

p <- ggplot(shot_perc_plot)+
  geom_tile(aes(x,y,fill=zfill))+
  geom_contour(aes(x,y,z=z),
               binwidth = 0.03,col="white",size=0.2,linetype="dashed")+
  geom_path(data = court_points,
            aes(x = x, y = y-5, group = desc),
            color = "black",size=0.8
  ) +
  geom_hline(yintercept = -5,size=0.8)+
  annotate("text",x = 0,y = 15,size = 12,
           label = "40-45%",colour = "#8b5122",family = "rokkitt")+
  annotate("text",x = -15,y = 24,size = 12,
           label = "30-35%",colour = "#f6eac6",family = "rokkitt")+
  annotate("text",x = -23,y = 5,size = 12,
           label = "35-40%",colour = "white",family = "rokkitt",angle = 90)+
  annotate("text",x = 23,y = 3,size = 12,
           label = "35-40%",colour = "white",family = "rokkitt",angle = 270)+
  annotate("text",x = 11,y = 2,size = 12,
           label = "35-40%",colour = "white",family = "rokkitt")+
  scale_fill_manual(values = c("#8b5122", "#d8b36b", "#f6eac6", 
                               "#FFFFFF", "#c9e8e5", "#5ab4ac", "#0f665e"),
                    name="FIELD GOAL PERCENTAGE BY AREA",
                    labels=c("30%","35%","40%","45%","50%","55%","60%"))+
  guides(fill = guide_legend(nrow = 1,title.position = "top",title.hjust = 0.5,
                             label.position = "bottom"))+
  coord_fixed(ylim = c(-5,30),xlim=c(-22,22))+
  labs(title="FIELD GOAL PERCENTAGES\n2013-14 TO 2017-18")+
  theme_minimal(base_size = 22) +
  theme(plot.title = element_text(hjust = 0.5,size = 48,colour = "grey45",
                                  family = "rokkitt",face = "bold",lineheight = 0.6),
        legend.position = "bottom",legend.justification = c(0.98,0),
        legend.text = element_text(size = 20,colour = "grey45",
                                   family = "rokkitt",angle = 45,hjust = 0.5),
        legend.title = element_text(size = 22,colour = "grey45",family = "rokkitt"),
        legend.box.margin = margin(c(-50,0,3,0)),
        plot.margin = margin(3,3,3,3, unit = "pt"),
        plot.background = element_rect(fill = "white", color = "white"),
        panel.background = element_rect(fill = "white", color = "white"),
        panel.grid = element_blank(),
        panel.border = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank())

ggsave("fgp2013-18.png",plot = p ,width = 10, height = 8, dpi = 200)

The result is not as smooth as the original but the overall picture is very close to it.

Points per shot

The third figure in the article is pretty much the same as the second. Just with different data.
So we can use the same aggregation/interpolation strategy as before.

bns <- 20 
hb <- hexbin(x = tbl$loc_x,y = tbl$loc_y,xbins = bns,IDs = TRUE)
shot_pts <- tbl %>% 
  mutate(hex = [email protected]) %>% 
  group_by(hex) %>% 
  dplyr::summarise(pts = sum(shot_made_numeric*shot_value)/n())

shot_pts <- left_join(shot_pts,tibble(hex = [email protected],x = [email protected],y = [email protected]),by="hex")

npts <- 1000
xyz_interp <- akima::interp(x = shot_pts$x,y = shot_pts$y,
                            z = shot_pts$pts,nx = npts,ny = npts,linear = T)

shot_pts_plot <- tibble(x = rep(xyz_interp[[1]],npts),
                         y = rep(xyz_interp[[2]],each=npts),
                         z = c(xyz_interp[[3]])) %>% 
  dplyr::filter(!is.na(z) & y>= -2) %>% 
  mutate(z=if_else(y<1 & z>=0.85 & (between(x,5,20) | between(x,-20,-5)),0.85,z)) %>%
  mutate(zfill=case_when(z<=0.85~0.85,
                         z<=0.90~0.90,
                         z<=0.95~0.95,
                         z<=1.00~1.00,
                         z<=1.05~1.05,
                         z<=1.10~1.1,
                         z<=1.2~1.2,
                         TRUE~1.25)) %>% 
  mutate(zfill=as.factor(zfill)) %>% 
  dplyr::filter(sqrt(x^2+(y-hoop_center_y)^2)<=27)

If you look closely at the third figure, you will realize that there is a strong cut between the
two and three point area. So technically, we should have treated the two areas separately and combine the
results in the end. However, I did not manage to do this in a satisfying way, so I stuck with treating both
areas together, which will of course change the figure slightly.

p <- ggplot(shot_pts_plot)+
  geom_tile(aes(x,y,fill=zfill))+
  geom_contour(aes(x,y,z=z),binwidth = 0.05,col="white",size=0.2,linetype="dashed")+
  geom_path(data = court_points,
            aes(x = x, y = y-5, group = desc),
            color = "black",size=0.8
  ) +
  geom_hline(yintercept = -5,size=0.8)+
  annotate("text",x=-14,y=11,size=12,
           label="UNDER 0.85",colour="#e8d6e8",family="rokkitt")+
  annotate("text",x=14,y=11,size=12,
           label="UNDER 0.85",colour="#e8d6e8",family="rokkitt")+
  annotate("text",x=0,y=15,size=12,
           label="0.85-0.9",colour="#e8d6e8",family="rokkitt")+
  annotate("text",x=-27,y=31,size=9,
           label="BETWEEN 2013-14 AND\n2017-18, THE AVERAGE NBA\nSHOT YIELDED 1.02\nPOINTS",
           hjust=0,vjust=1,family="rokkitt",lineheight = 0.6)+
  scale_fill_manual(values=c("#772d83","#9871ac","#c2a7cf","#e8d6e8",
                             "#daecd3","#a8d6a2","#59ae64","#10632f"),
                    name="AVERAGE POINTS PER FIELD GOAL ATTEMPT",
                    labels=c("0.85","0.90","0.95","1.00","1.05","1.10","1.20",""))+
  guides(fill = guide_legend(nrow = 1,title.position = "top",title.hjust = 0.5,
                             label.position = "bottom"))+
  coord_fixed(ylim = c(-5,30),xlim=c(-22,22),clip="off")+
  labs(title="POINTS PER SHOT\n2013-14 TO 2017-18")+
  theme_minimal(base_size = 22) +
  theme(plot.title = element_text(hjust=0.5,size = 48,colour="grey45",
                                  family="rokkitt",face = "bold",lineheight = 0.5),
        legend.position = "bottom",legend.justification = c(0.98,0),
        legend.text = element_text(size=20,colour="grey45",
                                   family="rokkitt",angle = 45,hjust=0.5),
        legend.title = element_text(size=22,colour="grey45",family="rokkitt"),
        legend.box.margin=margin(c(-50,0,3,0)),
        plot.margin = margin(3,3,3,3, unit = "pt"),
        plot.background = element_rect(fill = "white", color = "white"),
        panel.background = element_rect(fill = "white", color = "white"),
        panel.grid = element_blank(),
        panel.border = element_blank(),
        axis.text = element_blank(),
        axis.title = element_blank(),
        axis.ticks = element_blank())

ggsave("pps2013-18.png",plot = p ,width = 10, height = 8, dpi = 200)

Again, the result is not as smooth as the original. Especially the boundary of the two and
three point area looks different. Overall though, I think the figure captures the essential
result of the original.

3D plot of points per shot

Recently, I learned about the awesome rayshader package by
tylermorganwall. Especially the relatively
new feature of turning any ggplot object into a 3D plot is fascinating! The below code
turns the points per shot figure into a 3D object and produces a short video that navigates the
plot.

plot_gg(p, width = 5, height = 5, multicore = TRUE, scale = 250, 
        zoom = 0.7, theta = 10, phi = 30, windowsize = c(800, 800))

phivec <- 20 + 70 * 1/(1 + exp(seq(-5, 10, length.out = 180)))
phivecfull <- c(phivec, rev(phivec))
thetavec <- 90 * sin(seq(0,359,length.out = 360) * pi/180) #+270
zoomvec <- 0.5 + 0.5 * 1/(1 + exp(seq(-5, 10, length.out = 180)))
zoomvecfull <- c(zoomvec, rev(zoomvec))

for(i in 1:360) {
  render_camera(theta = thetavec[i],phi = phivecfull[i],zoom = zoomvecfull[i])
  render_snapshot(paste0("anim/frame", i, ".png"))
}

#Run this command in the command line using ffmpeg to stitch together a video:
#ffmpeg -framerate 60 -i frame%d.png -vcodec libx264 raymovie.mp4

#And run this command to convert the video to post to the web:
#ffmpeg -i raymovie.mp4 -pix_fmt yuv420p -profile:v baseline -level 3 -vf scale=-2:-2 rayweb.mp4

This is the result:

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)