Nothing like a little Sunday morning data hacking before a big game! I have been wanting to play with the NHL play-by-play event files for some time now. The JSON datasets provide a wealth of information about each event in the game including the location, as defined by the fields xcoord and ycoord.
I am pretty excited about today’s rematch between the Boston Bruins and the Detroit Redwings. It’s a national game, and the Bruins got destroyed on Friday, so it should make for an interesting contest. I have never used Tableau Public before, but do leverage Professional version at work. It is great, and in my humble opinion, it is the best BI tool for small to medium size businesses that I have seen. The public version is geared towards bloggers and is somewhat limited in features, but still very robust. I am going to attempt to update the event dataset during each intermission and update this post with the published workbook. Needless to say, this is doomed to fail, but I am going to give it a shot.
One of Tableau’s strengths is that it is fairly straightforward to use, especially if you have crunched data in Excel, SPSS, etc. The link below outlines how to use a background image. Don’t have the image on your computer? Not a problem, as Tableau can grab an image from the web, just as I did.
From the event files, I seems that a reasonable setting for the image should be x(-100,100) and y (40,-40). Yes, the positive should come first as I believe the y-coordinates are inverted relative to what you see displayed on the gamecenter page for a given game. I could be wrong, though.
NOTE: I am not certain of my settings for the image, because when you do a summary on the min/max of the coordinates, it isn’t exactly 100, but it seems good enough to my eye. Not to mention, most of the hits should take place along the boards, and even with this setting, some take place “outside” the boards on my image.
I am using my tool of choice, R, to grab and parse the dataset into a CSV file for Tableau to read. The code to grab the data can be found here.
The image below is an example of 100 randomly selected games from this season. I plotted the shots and goal events on top of an image of an NHL rink. Pretty cool huh? You can set the washout of the image so you can focus on the plotted data. I am sure that you can do this easily in R as well, but since I can barely debug my own code, any help on that front will be more than appreciated.
And because I can’t get this up and running and it’s close to game time, here is the link for the 100 random games below. It looks far better when they host it.
UPDATE: Here is the link to the event data for the entire game. You should be able to filter the data by period. For the slideshow, it appears as if you have to manually page through each play.