Clustering NHL Skaters

February 6, 2011
By

(This article was first published on Brock's Data Adventure » R, and kindly contributed to R-bloggers)

I have been sitting on this post for some time now and wanted to get it out there.  The goal is to simply show how easy it is to pull live data from the web into R, massage it, and perform some analytics on it.  I am not sure how useful this analysis really is in practice, but the larger point is to show you how powerful R is for very quick analysis.

I admit that I am a somewhat sloppy coder, but hopefully my comments may help you out, especially if you are new to R and are interested in things like:

  • How to sample data (both rows and columns)
  • Recode values
  • Re-order factors
  • Reduce the data using Principal components
  • Cluster the data using these components
  • Basic plotting and how can control everything you want on the plot

The code can be found here.  The plots below show you some of the output.

As mentioned above, this wasn’t aimed at being a in-depth review of team performance or skater ability, but I think you can see where this analysis could go.  The aim of the team distribution plot is to show the team distribution by their skaters, with reference lines that would break up the teams into 4 equal size groups.

If you follow the NHL, take a look at New Jersey or Toronto.  These two teams are not having the best seasons, and using this plot, more than half of their team is comprised of skaters who fall into the lower 2 performing clusters.  In addition, look at Philadelphia, one of the better teams in the league.  More than 25% of their team was clustered into the top performing group.

4 Cluster by Points Boxplot Dendrogram PCA biplot Team Distribution
Filed under: Fantasy HOckey, R Tagged: Cluster Analysis, NHL, PCA, R, webscraping

To leave a comment for the author, please follow the link and comment on his blog: Brock's Data Adventure » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.