Clustering NHL Skaters

February 6, 2011

(This article was first published on Brock's Data Adventure » R, and kindly contributed to R-bloggers)

I have been sitting on this post for some time now and wanted to get it out there.  The goal is to simply show how easy it is to pull live data from the web into R, massage it, and perform some analytics on it.  I am not sure how useful this analysis really is in practice, but the larger point is to show you how powerful R is for very quick analysis.

I admit that I am a somewhat sloppy coder, but hopefully my comments may help you out, especially if you are new to R and are interested in things like:

  • How to sample data (both rows and columns)
  • Recode values
  • Re-order factors
  • Reduce the data using Principal components
  • Cluster the data using these components
  • Basic plotting and how can control everything you want on the plot

The code can be found here.  The plots below show you some of the output.

As mentioned above, this wasn’t aimed at being a in-depth review of team performance or skater ability, but I think you can see where this analysis could go.  The aim of the team distribution plot is to show the team distribution by their skaters, with reference lines that would break up the teams into 4 equal size groups.

If you follow the NHL, take a look at New Jersey or Toronto.  These two teams are not having the best seasons, and using this plot, more than half of their team is comprised of skaters who fall into the lower 2 performing clusters.  In addition, look at Philadelphia, one of the better teams in the league.  More than 25% of their team was clustered into the top performing group.

4 Cluster by Points Boxplot
PCA biplot
Team Distribution

Filed under: Fantasy HOckey, R Tagged: Cluster Analysis, NHL, PCA, R, webscraping

To leave a comment for the author, please follow the link and comment on their blog: Brock's Data Adventure » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)