Interactive Heat Maps for R

May 23, 2016

(This article was first published on R – Modern Data, and kindly contributed to R-bloggers)

In every statistical analysis, the first thing one should do is try and visualise the data before any modeling. In microarray studies, a common visualisation is a heatmap of gene expression data.

In this post I simulate some gene expression data and visualise it using the heatmaply package in R by Tal Galili. This package extends the plotly engine to heatmaps, allowing you to inspect certain values of the data matrix by hovering the mouse over a cell. You can also zoom into a region of the heatmap by drawing a rectangle over an area of your choice

The following function simulates data from a multivariate normal distribution, and allows exposure dependent correlations between the data. You will need the mvrnorm function from the MASS library to run this function:

simulateExprData <- function(n, n0, p, rho0, rho1){ 
  # n: total number of subjects 
  # n0: number of subjects with exposure 0 
  # n1: number of subjects with exposure 1 
  # p: number of genes 
  # rho0: rho between Z_i and Z_j for subjects with exposure 0 
  # rho1: rho between Z_i and Z_j for subjects with exposure 1
  # Simulate gene expression values according to exposure 0 or 1, 
  # according to a centered multivariate normal distribution with 
  # covariance between Z_i and Z_j being rho^|i-j| 
  n1 <- n - n0 
  times <- 1:p
  H <- abs(outer(times, times, "-")) 
  V0 <- rho0^H 
  V1 <- rho1^H 

  # rows are people, columns are genes 
  genes0 <- MASS::mvrnorm(n = n0, mu = rep(0,p), Sigma = V0) 
  genes1 <- MASS::mvrnorm(n = n1, mu = rep(0,p), Sigma = V1) 
  genes <- rbind(genes0,genes1) 

genes <- simulateExprData(n = 100, n0 = 50, p = 100, 
                          rho0 = 0.01, rho1 = 0.95)

Next we need to install and load the heatmaply package:


The syntax is extremely simple. Lets plot a few different interactive heatmaps of the data. We first plot the genes and specify how many groups we want for the rows (subjects) and columns (genes) of the data. In this case we specify 2 groups for both the rows and columns:

heatmaply(genes, k_row = 2, k_col = 2)

We see that the clustering algorithm is not able to properly cluster the subjects because we would have expected to see two equal size groups in the dendrogram on the y-axis. Let’s instead plot the correlation matrix of the genes:

heatmaply(cor(genes), k_row = 2, k_col = 2)

We now see two distinct groups in the dendrogram, as we would expect because that’s the way we generated the data

To leave a comment for the author, please follow the link and comment on their blog: R – Modern Data. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)