Drawing heatmaps in R

Posted on June 24, 2011 by Christopher Bare in Uncategorized | 0 Comments

[This article was first published on Digithead's Lab Notebook, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A while back, while reading chapter 4 of Using R for Introductory Statistics, I fooled around with the mtcars dataset giving mechanical and performance properties of cars from the early 70’s. Let’s plot this data as a hierarchically clustered heatmap.

# scale data to mean=0, sd=1 and convert to matrix
mtscaled <- as.matrix(scale(mtcars))

# create heatmap and don't reorder columns
heatmap(mtscaled, Colv=F, scale='none')

By default, heatmap clusters by both rows and columns. It then reorders the resulting dendrograms according to mean. Setting Colv to false tells it not to reorder the columns, which will come in handy later. Let's also turn off the default scaling across rows. We've already scaled across columns, which is the sensible thing to do in this case.

If our columns are already in some special order, say as a time-series or by increasing dosage, we might want to cluster only rows. We could do that by setting the Colv argument to NA. One thing that clustering the columns tells us in this case is that some information is highly correlated, bordering on redundant. For example, displacement, horsepower and number of cylinders are quit similar. And the idea that to get more power (hp) and go faster (qsec) we need to burn more gas (mpg) is pretty well supported.

Separating clusters

If we'd like to separate out the clusters, I'm not sure of the best approach. One way is to use hclust and cutree, which allows you to specify k, the number of clusters you want. Don't forget that hclust requires a distance matrix as input.

# cluster rows
hc.rows <- hclust(dist(mtscaled))
plot(hc.rows)

# transpose the matrix and cluster columns
hc.cols <- hclust(dist(t(mtscaled)))

# draw heatmap for first cluster
heatmap(mtscaled[cutree(hc.rows,k=2)==1,], Colv=as.dendrogram(hc.cols), scale='none')

# draw heatmap for second cluster
heatmap(mtscaled[cutree(hc.rows,k=2)==2,], Colv=as.dendrogram(hc.cols), scale='none')

That works, but, I'd probably advise creating one heatmap and cutting it up in Illustrator, if need be. I have a nagging feeling that the color scale will end up being slightly different between the two clusters, since the range of values in each submatrix is different. Speaking of colors, if you don't like the default heat colors, try creating a new palette with color ramp.

palette <- colorRampPalette(c('#f0f3ff','#0033BB'))(256)
heatmap(mtscaled, Colv=F, scale='none', col=palette)

Confusing things

Another way to separate the clusters is to get the dendrograms out of heatmap and work with those. But Cutree applies to objects of class hclust, returned by hclust, and returns a map assigning each row in the original data to a cluster. Cutree takes either a height to cut at (h) or the desired number of clusters (k), which is nice.

Cut applies to dendrograms, which can be returned by heatmap if the keep.dendro option is set. Cut takes only h, not k, and returns a list with members upper and lower. Lower is a list of subtrees below the cut point.

Doing graphics with R starts easy, but gets arcane quickly. There's also a heatmap.2 function in the gplot package that adds color keys among other sparsely documented features.

This all needs some serious straightening out, but the basics are easy enough. Here are a couple more resources to make your heatmaps extra-hot:

Using R to draw a Heatmap from Microarray Data
Flowing Data's tutorial How to Make a Heatmap

...more on R.

To leave a comment for the author, please follow the link and comment on their blog: Digithead's Lab Notebook.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Drawing heatmaps in R

Separating clusters

Confusing things

Related

Separating clusters

Confusing things

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)