R: heatmaps with gplots

April 8, 2010

(This article was first published on compBiomeBlog, and kindly contributed to R-bloggers)

I use heatmaps quite a lot for visualizing data, microarrays of course but also DNA motif enrichment, base composition and other things. I particular like the heatmap.2 function of the gplots package. It has a couple of defaults that are a little ugly but they are easy to remove. Here is a quick example:

First lets make some example microarray data.

exampleData <- matrix(log2(rexp(1000)/rexp(1000)),nrow=200)

This just makes two exponential distributions and takes the log2 ratio to make it look a bit like microarray fold changes, but this really could be able matrix of numbers.

Next I will just plot the most variable row/genes/whatever, this step is obviously optional but it reduces the size of the plot to make them easier to see, and normally I only care about the things that are different.

evar <- apply(exampleData,1,var)

mostVariable <- exampleData[evar>quantile(evar,0.75),]

This just calculates the variance of each row in the matrix, then makes a new matrix of those rows that have a variance that is above the 75th percentile, so the top 25% most variable row.


 Next we load the gplots package (install it first if you do not already have it). We then simple pass the mostVariable matrix to the heatmap.2 function. The trace=”none” option removes a default, which is to add a line to each column, which I find distracting. The col=greenred(10) option uses another gplots function (greenred), which simply generates a color scheme from green to red via black. You could use any color scheme here such as col=rainbow(10) or a scheme from RColorBrewer.

That is about it really for basic heatmaps. 

For more advanced heatmaps, you can do other things such as adding color strips to the rows or columns to show groupings, for example:


Another useful trick is not to use the default clustering methods of heatmap.2, but use your own.  For example :

ord <- order(rowSums(abs(mostVariable)),decreasing=T)


Here were are generating the ordering of the rows ourselves, in this case by the sum of the absolute values of each row. Then we turn off the clustering of the rows and the row dendrogram and get something like this:

 There are lots of other options too, but that is enough for today.

To leave a comment for the author, please follow the link and comment on their blog: compBiomeBlog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training



CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)