Using maps and ggplot2 to visualize college hockey championships

March 13, 2013
By

(This article was first published on Decisions and R, and kindly contributed to R-bloggers)


Short:
I plot the frequency of college hockey championships by state using the maps package, and ggplot2

Note: this example is based heavily on the example provided at
http://www.dataincolour.com/2011/07/maps-with-ggplot2/

data reference:
http://en.wikipedia.org/wiki/NCAA_Men%27s_Ice_Hockey_Championship

Question of interest
As a good Minnesotan, I've believed for quite some time that the colder, Northern states enjoy a competitive advantage when it comes to college hockey. Does this advantage exist? How strong is it?

I first downloaded data from wikipedia on past winners of hockey championships, and saved the short list in an excel csv file.

After saving the file, here's how the data look in R:

# Visualizing College Hockey Champions by State

# Author: Mark T Patterson Date: March 13, 2013


# Libraries:
library(ggplot2)
library(maps)

# Changing library:
rm(list = ls()) # Clearing the work bench
setwd("C:/Users/Mark/Desktop/Blog/Data")

# Loading Data:


# Loading state championships data:
dat.state = read.csv("HockeyChampsByState.csv", header = TRUE)
dat.state$state = tolower(dat.state$state)
head(dat.state)
##           state titles
## 1 michigan 19
## 2 massachusetts 11
## 3 colorado 9
## 4 north dakota 7
## 5 minnesota 6
## 6 wisconsin 6

Now that we've loaded the information about hockey championships by state, we just need to load the mapping data. map_data(state') is a dataframe in the maps package. Here, we'll use the region column, which lists state names, to match our state championship data.

# Creating mapping dataframe:
us.state = map_data("state")
head(us.state)
##     long   lat group order  region subregion
## 1 -87.46 30.39 1 1 alabama
## 2 -87.48 30.37 1 2 alabama
## 3 -87.53 30.37 1 3 alabama
## 4 -87.53 30.33 1 4 alabama
## 5 -87.57 30.33 1 5 alabama
## 6 -87.59 30.33 1 6 alabama

# Merging the two datasets:

dat.champs = merge(us.state, dat.state, by.x = "region", by.y = "state",
all = TRUE)

dat.champs <- dat.champs[order(dat.champs$order), ]
# mapping requires the same order of observations that appear in us.state

head(dat.champs)
##    region   long   lat group order subregion titles
## 1 alabama -87.46 30.39 1 1 NA
## 2 alabama -87.48 30.37 1 2 NA
## 3 alabama -87.53 30.37 1 3 NA
## 4 alabama -87.53 30.33 1 4 NA
## 5 alabama -87.57 30.33 1 5 NA
## 6 alabama -87.59 30.33 1 6 NA

With the dat.champs frame created, we're ready to plot

# Plotting

(qplot(long, lat, data = dat.champs, geom = "polygon", group = group,
fill = titles) + theme_bw() + labs(x = "", y = "", fill = "") + scale_fill_gradient(low = "#EEEEEE",
high = "darkgreen") + opts(title = "College Hockey Championships By State",
legend.position = "bottom", legend.direction = "horizontal"))

plot of chunk unnamed-chunk-3

Having plotted the data, it's easy to see the effect of the 'great lakes' region on hockey championships. With the exception of Colorado, only Northern, colder states have won titles.

Ways to improve this analysis
While we observe that college title champions are clustered in the Northern Midwest and Northern East, it's possible that several variables could explain the distribution. We might consider examining 1) state temperature (we might expect that colder temperatures lead to better performance, since teams in colder states get to practice more), 2) distance from great lakes (this might be a proxy for the availability of ice), 3) distance from Canadian hockey cities (it's possible that hockey culture follows from Canadian or other European immigration).

Beyond examining these possible factors, it'd be interesting to try color presentations – I've adopted the same color scheme presented at http://www.dataincolour.com/2011/07/maps-with-ggplot2/ , but it would be good to have some familiarity with other schemes.

To leave a comment for the author, please follow the link and comment on their blog: Decisions and R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)