**R – Modern Data**, and kindly contributed to R-bloggers)

### The Problem

When clustering data using principal component analysis, it is often of interest to visually inspect how well the data points separate in 2-D space based on principal component scores. While this is fairly straightforward to visualize with a scatterplot, the plot can become cluttered quickly with annotations as shown in the following figure:

### Solution using `ggrepel`

The `ggrepel`

package by Kamil Slowikowski *implements functions to repel overlapping text labels away from each other and away from the data points that they label*. It’s an easy to use package that works well in this example as shown in the following figure:

### Solution using `plotly`

An alternative solution is to use interactive plots that are usable from the `R`

console, in the `RStudio`

viewer pane, in `R Markdown`

documents, and in `Shiny`

apps. Annotations can be viewed by hovering the mouse pointer over a point or dragging a rectangle around the relevant area to zoom in. Interactive plots using `plotly`

allow you to de-clutter the plotting area, include extra annotation information and create interactive web-based visualizations directly from `R`

. Once uploaded to a `plotly`

account, `plotly`

graphs (and the data behind them) can be viewed and modified in a web browser.

The resulting plot is clean and not cluttered with text annotations. While the `ggrepel`

package provides a nice solution in this example, the `plotly`

solution will be even more useful with a larger number of data points.

### The Code

#### Principal Component Analysis and Hierarchical Clustering

# cor = TRUE indicates that PCA is performed on # standardized data (mean = 0, variance = 1) pcaCars <- princomp(mtcars, cor = TRUE) # view objects stored in pcaCars names(pcaCars) # proportion of variance explained summary(pcaCars) # scree plot plot(pcaCars, type = "l") # cluster cars carsHC <- hclust(dist(pcaCars$scores), method = "ward.D2") # dendrogram plot(carsHC) # cut the dendrogram into 3 clusters carsClusters <- cutree(carsHC, k = 3) # add cluster to data frame of scores carsDf <- data.frame(pcaCars$scores, "cluster" = factor(carsClusters)) carsDf <- transform(carsDf, cluster_name = paste("Cluster",carsClusters))

#### First figure using `ggplot2`

library(ggplot2) p1 <- ggplot(carsDf,aes(x=Comp.1, y=Comp.2)) + theme_classic() + geom_hline(yintercept = 0, color = "gray70") + geom_vline(xintercept = 0, color = "gray70") + geom_point(aes(color = cluster), alpha = 0.55, size = 3) + xlab("PC1") + ylab("PC2") + xlim(-5, 6) + ggtitle("PCA Clusters from Hierarchical Clustering of Cars Data") p1 + geom_text(aes(y = Comp.2 + 0.25, label = rownames(carsDf)))

#### Second figure using `ggplot2`

with `ggrepel`

library(ggplot2) library(ggrepel) p1 + geom_text_repel(aes(y = Comp.2 + 0.25, label = rownames(carsDf)))

#### Interactive plot using `plotly`

library(plotly) p <- plot_ly(carsDf, x = Comp.1 , y = Comp.2, text = rownames(carsDf), mode = "markers", color = cluster_name, marker = list(size = 11)) p <- layout(p, title = "PCA Clusters from Hierarchical Clustering of Cars Data", xaxis = list(title = "PC 1"), yaxis = list(title = "PC 2")) p

### References

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Modern Data**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...