# lmds: Landmark Multi-Dimensional Scaling

**R | Robrecht Cannoodt**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Multi-dimensional scaling (MDS) (Kruskal 1964) is a dimensionality reduction

method used for visualising and denoising high-dimensional data. However, since MDS requires

calculating the distances between all pairs of data points, it does not scale well to datasets

with a large number of samples.

We released lmds v0.1.0, an implementation of

Landmark MDS (LMDS) (de Silva and Tenenbaum 2004). Landmark MDS only calculates the distances between a set of landmarks

and all other data points, thereby sacrificing determinism for scalability.

## Regular MDS

A single-cell transcriptomics dataset (**???**) is used to demonstrate (L)MDS,

containing 392 profiles which measure the abundance levels of 2000 differentmolecules within individual cells.

Note that while the dataset is thus only a 392×2000 matrix, LMDS is designed to scale to much higher dimensionality, as demonstrated in the last section.

Simply looking at the raw expression values as a heatmap reveals little to no information:

```
library(tidyverse)
set.seed(1)
dataset <- dyno::fibroblast_reprogramming_treutlein
cell_info <- data.frame(grouping = dataset$grouping)
pheatmap::pheatmap(
t(as.matrix(dataset$expression)),
show_colnames = FALSE,
show_rownames = FALSE,
annotation_col = cell_info
)
```

Applying MDS quickly reveals the underlying bifurcating topology of the dataset

(from MEF to myocytes and neurons).

```
# compute distance matrix
dist <- dynutils::calculate_distance(dataset$expression, method = "pearson")
dim(dist)
## [1] 392 392
# compute MDS
dimred_mds <- cmdscale(dist)
# plot points
qplot(dimred_mds[,1], dimred_mds[,2], colour = dataset$grouping) +
theme_bw() +
labs(x = "Comp 1", y = "Comp 2", colour = "Group")
```

Regular MDS, however, requires computing all pairwise distances between data points.

This dataset only contains 392 data points, but for datasets

it is increasingly infeasible to apply MDS.

## Landmark MDS

Landmark MDS (LMDS) (de Silva and Tenenbaum 2004) is an extension of MDS which scales much

better with respect to the number of data points in the dataset. A short while ago,

we published an R package on CRAN implementing this algorithm,

lmds v0.1.0.

Landmark MDS only computes the distance matrix between a set of landmarks and all other data points.

MDS is then only performed on the landmarks, and all other datapoints are projected into

the landmark space.

```
library(lmds)
# compute distances between random landmarks and all data points
dist_landmarks <- select_landmarks(
dataset$expression,
distance_method = "pearson",
num_landmarks = 150
)
dim(dist_landmarks)
## [1] 150 392
# perform LMDS
dimred_lmds <- cmdscale_landmarks(dist_landmarks)
# plot points
qplot(dimred_lmds[,1], dimred_lmds[,2], colour = dataset$grouping) +
theme_bw() +
labs(x = "Comp 1", y = "Comp 2", colour = "Group")
```

Most frequently, these two steps can be applied together as follows:

```
dimred_lmds2 <- lmds(
dataset$expression,
distance_method = "pearson",
num_landmarks = 150
)
```

## Execution time

In the figure below, the execution times of MDS and LMDS are compared by increasing

the size of a random dataset until the execution of either algorithms exceeds 10 seconds.

## Conclusion

LMDS is a heuristic for MDS which scales linearly with respect to the number of points

in the dataset. Go ahead and check out our implementation for this algorithm,

available on `CRAN`

.

If you encounter any issues, feel free to let me know by creating an

issue post on Github.

## References

de Silva, Vin, and Joshua B Tenenbaum. 2004. “Sparse Multidimensional Scaling Using Landmark Points.” *Technical Report, Stanford University*, 41.

Kruskal, J. B. 1964. “Multidimensional Scaling by Optimizing Goodness of Fit to a Nonmetric Hypothesis.” *Psychometrika* 29 (1): 1–27. https://doi.org/10.1007/BF02289565.

**leave a comment**for the author, please follow the link and comment on their blog:

**R | Robrecht Cannoodt**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.