Reconstructing Principal Component Analysis Matrix

April 5, 2013

(This article was first published on mintgene » R, and kindly contributed to R-bloggers)


PCA is widely used method for finding patterns in high-dimensional data. Whether you use it to compress large matrix or to remove one of the principal components in biological datasets, you’ll end up with the task of performing series of equations from linear algebra to reconstruct the matrix of original dimensions.

To find principal components, we first need to center the input matrix, and then calculate the eigenvalues and eigenvectors of its covariance matrix. To illustrate matrix reconstruction I’ll use 32×32 faceData matrix from Jeff Leek’s coursera course on data analysis.

# get the dataset from
# you probably want to use stats::prcomp for PCA on big matrices
runPCA <- function(mat = 'Unadjusted matrix') eigen(cov(apply(mat, 2, function(i) i - mean(i))))
pca <- runPCA(faceData)

# > str(pca)
# List of 2
# $ values : num [1:32] 18915 10067 4724 2506 1366 ...
# $ vectors: num [1:32, 1:32] -0.236 -0.261 -0.192 -0.164 -0.204 ...

First thing after doing PCA is to check the contributions of each PC in explaining the variance.

varExplained <- function(eigenList) {

par(mfrow = c(1,2))

 eigenList$value / sum(eigenList$value), pch = 21, col = 'black',
 bg = '#549cc4', ylim = c(0, 1), xlab = 'Principal Component',
 ylab = 'Variance Explained'
 ) + abline(h = 0.9)

 cumsum(eigenList$value) / sum(eigenList$value), pch = 21,
 col = 'black', bg = '#549cc4', ylim = c(0, 1), xlab = 'Principal Component',
 ylab = 'Cumulative Variance Explained'
 ) + abline(h = 0.9)



From these plots you can see that faceData has ~5 PC’s that cumulatively explain ~90% of total variance. Lets use this information to reconstruct the matrix, and compare it to the original one.

afterPCA <- function(
 matAdjust = 'Centered matrix',
 meanList = 'List of column means of original (unadjusted) matrix',
 eigenList = 'List of eigenvalues and eigenvectors of adjust matrix covariance matrix',
 n = 'selected PC\'s',
 specific_select = 'If True: n == 1:n, if False: just n\'th columns') {

 if (length(n) > ncol(matAdjust)) stop('N is higher than the number of PC\'s')
 if (!specific_select & length(n) > 1) stop('Use a single number when selecting up to n\'th PC')
 if (!specific_select) n <- 1:n

 t(eigenList$vectors[,n] %*% (t(eigenList$vectors[,n]) %*% t(matAdjust))) + t(matrix(meanList, nrow = nrow(matAdjust), ncol = ncol(matAdjust)))

# ColorBrewer palette
showMatrix <- function(x, ...) image(t(x[nrow(x):1,]), xaxt = 'none', yaxt = 'none', col = rev(colorRampPalette(brewer.pal(7, 'Blues'))(100)), ...)

reconstMatrix <- afterPCA(
 matAdjust = apply(faceData, 2, function(i) i - mean(i)),
 meanList = apply(faceData, 2, mean),
 eigenList = pca,
 n = 5,
 specific_select = FALSE

par(mfrow = c(1,2), mar = c(0, 0, 1, 0), bty = 'n')
showMatrix(faceData, main = 'Original Matrix')
showMatrix(reconstMatrix, main = 'First 5 PC\'s')


As seen from eigenvalues (variances), taking only 5/32 PC’s is enough to recreate face that has almost all of the features of the original matrix.

To leave a comment for the author, please follow the link and comment on their blog: mintgene » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)