PCA or SPCA or NSPCA?

November 15, 2013
By

(This article was first published on Chen-ang Statistics Ā» R, and kindly contributed to R-bloggers)

Principal component analysis(PCA) is one of the classical methods in multivariate statistics. In addition, it is now widely used as a way to implementĀ data-processing andĀ dimension-reduction. Besides statistics, there are numerous applications about PCA in engineering, biology, and so on. There are two mainĀ optimal properties of PCA, Ā which areĀ guaranteeing minimal information loss and uncorrelatedĀ principal components. That’s why PCA becomes so successful nowadays.

Although the classical(traditional) PCA method is theoretically sound,Ā how to explain the result(the derived PCs) is still a headache. For example, you know, the loadings are typically non-zero(each PC is a linear combination of allĀ variables), but the question is that, if we choose the first PC, we can consider that the first PC depends on all of the original variables. Obviously it is not good for us. Another questions is that we usually need to require theĀ weights which should be non-negative based on our pre-study intuition.

Because of these drawbacks above, there are several extensions of the traditional PCA methods proposed. Sparse principal component analysis(SPCA) and non-negativeĀ sparse principal component analysis(NSPCA) probably are good solutions. However, the obvious question is the computational complexity. As Ā a matter of fact, no matter theĀ sparseness constraint or sparseness constraint will lead the optimization problem become a NP hard problem. So far, Ā researchers have not found a general algorithm which can solve these problem perfectly, although there are numerousĀ methodologies(viaĀ relaxing several constraints or ended up with a local optimal solution) which have been developed. So before we choose the eventual method, we should weight the advantages and disadvantages.

By the way, all of the PCA, SPCA and NSPCA can be implented via R. Function princomp(for Q-mode PCA use function prcomp) performs a principal components analysis in R.Ā The function nsprcomp in package nsprcompĀ performs a constrained principal component analysis(both SPCA and NSPCA). If the value of the argument nneg is TRUE,Ā the loadings should be non-negative.Ā Ā It is worth mentioning that set a appropriate argument is very important.

The following code is a simple example from theĀ reference manual of the packageĀ nsprcomp:

library(nsprcomp);
library(MASS)
set.seed(1)
prcomp(Boston, tol = 0.36, scale. = TRUE)
nsprcomp(Boston, k = c(13,7,5,5), scale. = TRUE)
nsprcomp(Boston, k=c(7,5,2,2), nneg = TRUE, scale. = TRUE)

 

References

http://link.springer.com/content/pdf/10.1007%2F978-3-540-89197-0_13.pdf

http://www.stanford.edu/~hastie/Papers/spc_jcgs.pdf

http://cran.r-project.org/web/packages/nsprcomp/nsprcomp.pdf

 

 

To leave a comment for the author, please follow the link and comment on their blog: Chen-ang Statistics Ā» R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)