ggplot2: Quick Heatmap Plotting

January 25, 2010
By

(This article was first published on Learning R, and kindly contributed to R-bloggers)

A post on FlowingData blog demonstrated how to quickly make a heatmap below using R base graphics.

This post shows how to achieve a very similar result using ggplot2.

nba_heatmap_revised.png



Data Import

FlowingData used last season’s NBA basketball statistics provided by databasebasketball.com, and the csv-file with the data can be downloaded directly from its website.

> nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv")

The players are ordered by points scored, and the Name variable converted to a factor that ensures proper sorting of the plot.

> nba$Name <- with(nba, reorder(Name, PTS))

Whilst FlowingData uses heatmap function in the stats-package that requires the plotted values to be in matrix format, ggplot2 operates with dataframes. For ease of processing, the dataframe is converted from wide format to a long format.

The game statistics have very different ranges, so to make them comparable all the individual statistics are rescaled.

> library(ggplot2)
> nba.m <- melt(nba)
> nba.m <- ddply(nba.m, .(variable), transform,
+     rescale = rescale(value))

Plotting

There is no specific heatmap plotting function in ggplot2, but combining geom_tile with a smooth gradient fill does the job very well.

> (p <- ggplot(nba.m, aes(variable, Name)) + geom_tile(aes(fill = rescale),
+     colour = "white") + scale_fill_gradient(low = "white",
+     high = "steelblue"))
basketball_heatmap-008.png

A few finishing touches to the formatting, and the heatmap plot is ready for presentation.

> base_size <- 9
> p + theme_grey(base_size = base_size) + labs(x = "",
+     y = "") + scale_x_discrete(expand = c(0, 0)) +
+     scale_y_discrete(expand = c(0, 0)) + opts(legend.position = "none",
+     axis.ticks = theme_blank(), axis.text.x = theme_text(size = base_size *
+         0.8, angle = 330, hjust = 0, colour = "grey50"))
basketball_heatmap-010.png

Rescaling Update

In preparing the data for the above plot all the variables were rescaled so that they were between 0 and 1.

Jim rightly pointed out in the comments (and I did not initally get it) that the heatmap-function uses a different scaling method and therefore the plots are not identical. Below is an updated version of the heatmap which looks much more similar to the original.

> nba.s <- ddply(nba.m, .(variable), transform,
+     rescale = scale(value))
> last_plot() %+% nba.s
basketball_heatmap-013.png

To leave a comment for the author, please follow the link and comment on his blog: Learning R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.