Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The recent release of R 3.2.2 came with a small (but highly valuable) improvement to the stats:::labels.dendrogram function. When working with dendrograms with (say) 1000 labels, the new function offers a 70 times speed improvement over the version of the function from R 3.2.1. This speedup is even better than the Rcpp version of labels.dendrogram from the dendextendRcpp package.

Here is some R code to demonstrate this speed improvement:

 # IF you are missing an of these - they should be installed: install.packages("dendextend") install.packages("dendextendRcpp") install.packages("microbenchmark")     # Getting labels from dendextendRcpp labelsRcpp% dist %>% hclust %>% as.dendrogram labels(dend)

And here are the results:

 > microbenchmark(labels_3.2.1(dend), labels_3.2.2(dend), labelsRcpp(dend)) Unit: milliseconds expr min lq median uq max neval labels_3.2.1(dend) 186.522968 189.395378 195.684164 208.328365 321.98368 100 labels_3.2.2(dend) 2.604766 2.826776 2.891728 3.006792 21.24127 100 labelsRcpp(dend) 3.825401 3.946904 3.999817 4.179552 11.22088 100 > > microbenchmark(labels_3.2.2(dend), order.dendrogram(dend)) Unit: microseconds expr min lq median uq max neval labels_3.2.2(dend) 2520.218 2596.0880 2678.677 2885.2890 9572.460 100 order.dendrogram(dend) 665.191 712.2235 954.951 996.1055 2268.812 100

As we can see, the new labels function (in R 3.2.2) is about 70 times faster than the older version (from R 3.2.1). When only wanting something like the number of labels, using length on order.dendrogram will still be (about 3 times) faster than using labels.

This improvement is expected to speedup various functions in the dendextend R package (a package for visualizing, adjusting, and comparing dendrograms, which heavily relies on labels.dendrogram). We expect to get even better speedup improvements for larger trees.