**Chen-ang Statistics Ā» R**, and kindly contributed to R-bloggers)

Principal component analysis(PCA) is one of the classical methods in multivariate statistics. In addition, it is now widely used as a way to implementĀ data-processing andĀ dimension-reduction. Besides statistics, there are numerous applications about PCA in engineering, biology, and so on. There are two mainĀ optimal properties of PCA, Ā which areĀ guaranteeing minimal information loss and uncorrelatedĀ principal components. That’s why PCA becomes so successful nowadays.

Although the classical(traditional) PCA method is theoretically sound,Ā how to explain the result(the derived PCs) is still a headache. For example, you know, the loadings are typically non-zero(each PC is a linear combination of allĀ variables), but the question is that, if we choose the first PC, we can consider that the first PC depends on all of the original variables. Obviously it is not good for us. Another questions is that we usually need to require theĀ weights which should be non-negative based on our pre-study intuition.

Because of these drawbacks above, there are several extensions of the traditional PCA methods proposed. Sparse principal component analysis(SPCA) and non-negativeĀ sparse principal component analysis(NSPCA) probably are good solutions. However, the obvious question is the computational complexity. As Ā a matter of fact, no matter theĀ sparseness constraint or sparseness constraint will lead the optimization problem become a NP hard problem. So far, Ā researchers have not found a general algorithm which can solve these problem perfectly, although there are numerousĀ methodologies(viaĀ relaxing several constraints or ended up with a local optimal solution) which have been developed. So before we choose the eventual method, we should weight the advantages and disadvantages.

By the way, all of the PCA, SPCA and NSPCA can be implented via R. Function princomp(for Q-mode PCA use function prcomp) performs a principal components analysis in R.Ā The function nsprcomp in package nsprcompĀ performs a constrained principal component analysis(both SPCA and NSPCA). If the value of the argument nneg is TRUE,Ā the loadings should be non-negative.Ā Ā It is worth mentioning that set a appropriate argument is very important.

The following code is a simple example from theĀ reference manual of the packageĀ nsprcomp:

library(nsprcomp); library(MASS) set.seed(1) prcomp(Boston, tol = 0.36, scale. = TRUE) nsprcomp(Boston, k = c(13,7,5,5), scale. = TRUE) nsprcomp(Boston, k=c(7,5,2,2), nneg = TRUE, scale. = TRUE)

**References**

http://link.springer.com/content/pdf/10.1007%2F978-3-540-89197-0_13.pdf

http://www.stanford.edu/~hastie/Papers/spc_jcgs.pdf

http://cran.r-project.org/web/packages/nsprcomp/nsprcomp.pdf

**leave a comment**for the author, please follow the link and comment on their blog:

**Chen-ang Statistics Ā» R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...