Basketball Data Part II – Length of Career by Position

June 2, 2014
By

(This article was first published on Analyst At Large » R, and kindly contributed to R-bloggers)

In the previous post, I showed how easy it is to use R to scrape XML tables from websites (I used the XML package to scrape some basic basketball data).  In this post, I’ll explore the idea that NBA career length might vary by position.  Before reviewing this data, I assumed that centers (and big men in general) would have the shortest NBA careers.  My theory was that these guys were just too big to stay healthy long enough to string together a career – they’re just too big.  Let’s see what the data says:

Image

It seems like the median career length is 2 years for centers, guards, and forwards.  We can see that centers and guards tend to have longer careers than forwards in general.  If we look and C-F and G-F, we can see that these players average significantly longer careers than single position players.  I don’t know a lot about basketball, so its difficult for me to speculate why these players have longer careers.  Maybe they’re so athletic that they can easily play either position and more athletic players tend to have longer careers?  Maybe these players have been in the league so long that they get moved around and thus earn the “C-F” or “G-F” designation?  Any theories from people who know more about basketball?

I also looked briefly at retirement age:

Image

We can see a similar trend here with centers and guards retiring later than forwards (and C-F/G-F players retiring later than all single position players).  More than 75% of forwards retire from the NBA before their 30′s.  I’m 29 now.  Good thing I’m not a forward…

Here is the code:

###### Settings
library(XML)
setwd("C:/Blog/Basketball")
 
###### URLs
url<-paste0("http://www.basketball-reference.com/players/",letters,"/")
len<-length(url)
 
###### Reading data
tbl<-readHTMLTable(url[1])[[1]]
 
for (i in 2:len)
	{tbl<-rbind(tbl,readHTMLTable(url[i])[[1]])}
 
###### Formatting data
colnames(tbl)<-c("Name","StartYear","EndYear","Position","Height","Weight","BirthDate","College")
tbl$BirthDate<-as.Date(tbl$BirthDate,format="%B %d, %Y")
 
tbl$StartYear<-as.numeric(as.character(tbl$StartYear))
tbl$EndYear<-as.numeric(as.character(tbl$EndYear))
 
tbl$Position[tbl$Position=="F-C"]<-"C-F"
tbl$Position[tbl$Position=="F-G"]<-"G-F"
tbl$Position<-factor(tbl$Position,levels=c("C","G","F","C-F","G-F"))
 
###### Career Length
tbl$LEN<-tbl$EndYear-tbl$StartYear
 
table(tbl$Position)
boxplot(tbl$LEN~tbl$Position,col="light blue",ylab="Years",xlab="Position",
	main="Length of Career by Position")
 
###### Age at Retirement
tbl$RetireAge<-tbl$EndYear-as.numeric(substr(tbl$BirthDate,0,4))
 
boxplot(tbl$RetireAge~tbl$Position,col="light blue",ylab="Retirement Age",xlab="Position",
	main="Retirement Age by Position")
 
###### Removing Currently Active Players
retired<-tbl[tbl$EndYear<2014,]
 
boxplot(tbl$LEN~tbl$Position,col="light blue",ylab="Years",xlab="Position",
	main="Length of Career by Position")
 
boxplot(tbl$RetireAge~tbl$Position,col="light blue",ylab="Retirement Age",xlab="Position",
	main="Retirement Age by Position")

Created by Pretty R at inside-R.org


To leave a comment for the author, please follow the link and comment on his blog: Analyst At Large » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.