Site icon R-bloggers

Creating a Scree Plot in Base R

[This article was first published on Steve's Data Tips and Tricks, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
< section id="introduction" class="level1">

Introduction

A scree plot is a line plot that shows the eigenvalues or variance explained by each principal component (PC) in a Principal Component Analysis (PCA). It is a useful tool for determining the number of PCs to retain in a PCA model.

In this blog post, we will show you how to create a scree plot in base R. We will use the iris dataset as an example.

< section id="step-1-load-the-dataset-and-prepare-the-data" class="level1">

Step 1: Load the dataset and prepare the data

# Drop the non-numerical column
df <- iris[, -5]

E Step 2: Perform Principal Component Analysis

# Perform PCA on the iris dataset
pca <- prcomp(df, scale = TRUE)

E Step 3: Create the scree plot

# Extract the eigenvalues from the PCA object
eigenvalues <- pca$sdev^2

# Create a scree plot
plot(eigenvalues, type = "b",
     xlab = "Principal Component",
     ylab = "Eigenvalue")

# Add a line at y = 1 to indicate the elbow
abline(v = 2, col = "red")

# Percentage of variance explained
plot(eigenvalues/sum(eigenvalues), type = "b",
     xlab = "Principal Component",
     ylab = "Percentage of Variance Explained")
abline(v = 2, col = "red")

< section id="interpretation" class="level1">

Interpretation

The scree plot shows that the first two principal components explain the most variance in the data. The third and fourth principal components explain much less variance.

Based on the scree plot, we can conclude that the first two principal components are sufficient for capturing the most important information in the data.

Here are the eigenvalues and the percentage explained

eigenvalues
[1] 2.91849782 0.91403047 0.14675688 0.02071484
eigenvalues/sum(eigenvalues)
[1] 0.729624454 0.228507618 0.036689219 0.005178709
< section id="try-it-yourself" class="level1">

Try it yourself

Try creating a scree plot for another dataset of your choice. You can use the same steps outlined above.

Here are some additional tips for creating scree plots:

To leave a comment for the author, please follow the link and comment on their blog: Steve's Data Tips and Tricks.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Exit mobile version