[This article was first published on R – Hi! I am Nagdev, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Principle Component Analysis (PCA) is a great tool for a data analysis projects for a lot of reasons. If you have never heard of PCA, in simple words it does a linear transformation of your features using covariance or correlation. I will add a few links below if you want to know more about it. Some of the applications of PCA are dimensional reduction, feature analysis, data compression, anomaly detection, clustering and many more. The first time I learnt about PCA, it was not easy to understand and quite confusing. But, as I started to read about its applications in research papers, I started to get curious and try them all out. Now, I use it for most of my projects as a pre-processing step.

I recently added this topic to my data science curriculum as PCA has become relevant in data science today. The first time I taught this to my students, 90% of the class had a blank look on their face. Honestly, it was my own reflection. Then, I leaned towards demonstrative teaching rather than using slides and talking for an hour. This actually made it a lot easier to understand. I thought of sharing this example on my blog and help those in need.

For this example we will use this grey scale image as shown below. Also, I will try to keep R code used in this example as minimalistic as possible.

### Step 1: Image processing

Load imager library, load the image and convert the image to row x column matrix grid.

Next, we will visualize our image using image function. A post on stack overflow helped me out on using image function the right way.

library(imager)

# load the image and look at the image properties
image
# Image. Width: 282 pix Height: 220 pix Depth: 1 Colour channels: 3

# convert image data to data frame
image_df = as.data.frame(image)

# x y cc     value
# 1 1 1  1 0.9372549
# 2 2 1  1 0.9254902
# 3 3 1  1 0.9254902
# 4 4 1  1 0.9294118
# 5 5 1  1 0.9372549
# 6 6 1  1 0.9372549

# convert image into x and y grid using matrix function
image_mat = matrix(image_df$value, nrow = 220, ncol = 282, byrow = TRUE) # visualize the image image(t(apply(image_mat, 2, rev)), col=grey(seq(0,1,length=256))) ### Step 2: PCA analysis The next step is to load the matrix to principal component function to perform reconstruction. Scaling is very important for PCA. Since the image I used is grey scale, I have not scaled the data to keep it simple. Then we visualize principal components and identify that the first 5 contribute to the highest variance in the data as shown in the below image. # pca analysis pca_model = prcomp(image_mat) # plot the scree plot plot(pca_model) ### Step 3: Reconstruction and visualization The final step is to visualize the reconstructed image for each of the components. Here, we will use alternating components from 1 to 9 and plot them on a grid to visualize PCA reconstruction. To perform the reconstruction, we will first do a matrix multiplication of say, first PC and the transpose of rotation of the first component. This will generate a matrix resembling our image dimension. Finally, we will take this reconstructed data and plot an image. To make this little more easier, I have put all the reconstruction and visualization into a function. Then loop through lappy to visualize the reconstructed images as shown below. # Reconsturction and plotting par(mfrow= c(3,3)) recon_fun = function(comp){ recon = pca_model$x[, 1:comp] %*% t(pca_model\$rotation[, 1:comp])
image(t(apply(recon, 2, rev)), col=grey(seq(0,1,length=256)), main = paste0("Principle Components = ", comp))
}

# run reconstruction for 1:17 alternating components
lapply(seq(1,18, by = 2), recon_fun)

As we see in the above image, as we add more components for reconstruction, the image gets clearer. In real world application we could just store few components of the data as a representation of the image and reconstruct the image. We could also use this reconstructed image and feed it to neural network to enhance the quality of the image. Now, you know how dimensionality reduction works for images using PCA. This step by step demonstrative approach has definitely helped while teaching in my class and I wished if I was taught this way.

Below are some of the best tutorials on PCA out there.

I have written few jupyter notebooks on applications of PCA in anomaly detection and dimensionality reduction on my GitHub page. Feel free to check it out.