# lmds: Landmark Multi-Dimensional Scaling

**R | Robrecht Cannoodt**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Multi-dimensional scaling (MDS) (Kruskal 1964) is a dimensionality reduction method used for visualising and denoising high-dimensional data. However, since MDS requires calculating the distances between all pairs of data points, it does not scale well to datasets with a large number of samples.

We released lmds v0.1.0, an implementation of Landmark MDS (LMDS) (de Silva and Tenenbaum 2004). Landmark MDS only calculates the distances between a set of landmarks and all other data points, thereby sacrificing determinism for scalability.

## Regular MDS

A single-cell transcriptomics dataset (**???**) is used to demonstrate (L)MDS,
containing 392 profiles which measure the abundance levels of 2000 differentmolecules within individual cells.
Note that while the dataset is thus only a 392×2000 matrix, LMDS is designed to scale to much higher dimensionality, as demonstrated in the last section.

Simply looking at the raw expression values as a heatmap reveals little to no information:

library(tidyverse) set.seed(1) dataset <- dyno::fibroblast_reprogramming_treutlein cell_info <- data.frame(grouping = dataset$grouping) pheatmap::pheatmap( t(as.matrix(dataset$expression)), show_colnames = FALSE, show_rownames = FALSE, annotation_col = cell_info )

Applying MDS quickly reveals the underlying bifurcating topology of the dataset (from MEF to myocytes and neurons).

# compute distance matrix dist <- dynutils::calculate_distance(dataset$expression, method = "pearson") dim(dist) ## [1] 392 392 # compute MDS dimred_mds <- cmdscale(dist) # plot points qplot(dimred_mds[,1], dimred_mds[,2], colour = dataset$grouping) + theme_bw() + labs(x = "Comp 1", y = "Comp 2", colour = "Group")

Regular MDS, however, requires computing all pairwise distances between data points. This dataset only contains 392 data points, but for datasets it is increasingly infeasible to apply MDS.

## Landmark MDS

Landmark MDS (LMDS) (de Silva and Tenenbaum 2004) is an extension of MDS which scales much better with respect to the number of data points in the dataset. A short while ago, we published an R package on CRAN implementing this algorithm, lmds v0.1.0.

Landmark MDS only computes the distance matrix between a set of landmarks and all other data points. MDS is then only performed on the landmarks, and all other datapoints are projected into the landmark space.

library(lmds) # compute distances between random landmarks and all data points dist_landmarks <- select_landmarks( dataset$expression, distance_method = "pearson", num_landmarks = 150 ) dim(dist_landmarks) ## [1] 150 392 # perform LMDS dimred_lmds <- cmdscale_landmarks(dist_landmarks) # plot points qplot(dimred_lmds[,1], dimred_lmds[,2], colour = dataset$grouping) + theme_bw() + labs(x = "Comp 1", y = "Comp 2", colour = "Group")

Most frequently, these two steps can be applied together as follows:

dimred_lmds2 <- lmds( dataset$expression, distance_method = "pearson", num_landmarks = 150 )

## Execution time

In the figure below, the execution times of MDS and LMDS are compared by increasing the size of a random dataset until the execution of either algorithms exceeds 10 seconds.

## Conclusion

LMDS is a heuristic for MDS which scales linearly with respect to the number of points
in the dataset. Go ahead and check out our implementation for this algorithm,
available on `CRAN`

.
If you encounter any issues, feel free to let me know by creating an
issue post on Github.

## References

de Silva, Vin, and Joshua B Tenenbaum. 2004. “Sparse Multidimensional Scaling Using Landmark Points.” *Technical Report, Stanford University*, 41.

Kruskal, J. B. 1964. “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis.” *Psychometrika* 29 (1): 1–27. https://doi.org/10.1007/BF02289565.

**leave a comment**for the author, please follow the link and comment on their blog:

**R | Robrecht Cannoodt**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.