PCA or SPCA or NSPCA?

[This article was first published on Chen-ang Statistics » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Principal component analysis(PCA) is one of the classical methods in multivariate statistics. In addition, it is now widely used as a way to implement data-processing and dimension-reduction. Besides statistics, there are numerous applications about PCA in engineering, biology, and so on. There are two main optimal properties of PCA,  which are guaranteeing minimal information loss and uncorrelated principal components. That’s why PCA becomes so successful nowadays.

Although the classical(traditional) PCA method is theoretically sound, how to explain the result(the derived PCs) is still a headache. For example, you know, the loadings are typically non-zero(each PC is a linear combination of all variables), but the question is that, if we choose the first PC, we can consider that the first PC depends on all of the original variables. Obviously it is not good for us. Another questions is that we usually need to require the weights which should be non-negative based on our pre-study intuition.

Because of these drawbacks above, there are several extensions of the traditional PCA methods proposed. Sparse principal component analysis(SPCA) and non-negative sparse principal component analysis(NSPCA) probably are good solutions. However, the obvious question is the computational complexity. As  a matter of fact, no matter the sparseness constraint or sparseness constraint will lead the optimization problem become a NP hard problem. So far,  researchers have not found a general algorithm which can solve these problem perfectly, although there are numerous methodologies(via relaxing several constraints or ended up with a local optimal solution) which have been developed. So before we choose the eventual method, we should weight the advantages and disadvantages.

By the way, all of the PCA, SPCA and NSPCA can be implented via R. Function princomp(for Q-mode PCA use function prcomp) performs a principal components analysis in R. The function nsprcomp in package nsprcomp performs a constrained principal component analysis(both SPCA and NSPCA). If the value of the argument nneg is TRUE, the loadings should be non-negative.  It is worth mentioning that set a appropriate argument is very important.

The following code is a simple example from the reference manual of the package nsprcomp:

library(nsprcomp);
library(MASS)
set.seed(1)
prcomp(Boston, tol = 0.36, scale. = TRUE)
nsprcomp(Boston, k = c(13,7,5,5), scale. = TRUE)
nsprcomp(Boston, k=c(7,5,2,2), nneg = TRUE, scale. = TRUE)

 

References

http://link.springer.com/content/pdf/10.1007%2F978-3-540-89197-0_13.pdf

http://www.stanford.edu/~hastie/Papers/spc_jcgs.pdf

http://cran.r-project.org/web/packages/nsprcomp/nsprcomp.pdf

 

 

To leave a comment for the author, please follow the link and comment on their blog: Chen-ang Statistics » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)