Cluster NHL Teams Based on 2012/13 Regular Season Performance

June 12, 2013

(This article was first published on Data Twirling » R, and kindly contributed to R-bloggers)

Since tonight kicks off Game 1 of the Stanley Cup Finals, I thought it would be fun to do a very quick and dirty cluster analysis of the league based on regular season performance.

Tonight, the Chicago Blackhawks square off against my hometown team, the Boston Bruins.  Even though it was a lockout-shortened season, the Blackhawks started off by playing 24 consecutive games without a loss.  Given this incredible start, I was eager to see how statistically similar the Bruins were relative to their opponent and other teams they faced in the playoffs.

The process is as follows:

  • Crawl the 2012-13 regular season data for each team
  • Normalize the statistics and create a distance matrix
  • Use hierarchical clustering to group the teams

Of course, all of this will be completed in my language of choice, R.


The image above shows 3 dendrograms using 3 different methods.

I will let you draw your own conclusions, but I find it interesting that:

  • Chicago and Pittsburgh (the team Boston defeated to go the Stanley Cup) are basically isolated in 2 of the trees
  • Using Average linkage, Chicago/Pittsburgh stand alone from the pack, but so does Boston from the group of other playoff teams
  • By and large, the techniques were able to isolate the majority of teams that did not make the playoffs

Just in case you are trying to learn R, here is the code.

To leave a comment for the author, please follow the link and comment on their blog: Data Twirling » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

plotly webpage

dominolab webpage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training




CRC R books series

Six Sigma Online Training

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)