Principal Component Analysis (PCA) using R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
PCA means Principal Component Analysis. PCA is a multivariate technique that is used to reduce the dimension of a data set. More precisely, PCA is concerned with explaining the variance-covariance structure through a few linear combinations of the original variables. Thus PCA transforms the original set of variables into a smaller set of linear combinations that account for most of the variance of the original set.
Objectives of PCA
There are two main objectives of PCA. They are,
- Data reduction: Although p-components are reproduce the total variability often much of the variability can be accounted for by a small number say k of the PCs.
- Interpretation: Analysis of PCs often reveals relationships that were not previously suspected and thereby allows interpretations that would not ordinarily result.
PCA is also used for the following perposes:
- PCA can give the best linearly independent and different combinations of features so we can use them to describe our data differently.
- More Realistic Perspective and Less Complexity.
- Better visualization.
- Reduce size.
R code of Principal Component Analysis (PCA)
##First, load the package: library("factoextra") ##Input/Insert/Load your data set: ##For example we use a data set data(decathlon2) data(decathlon2) decathlon2.active <- decathlon2[1:23, 1:10] decathlon2.active ##Performing PCA: res.pca <- prcomp(decathlon2.active, scale = TRUE) fviz_eig(res.pca) fviz_pca_ind(res.pca, col.ind = "cos2", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE) viz_pca_var(res.pca, col.var = "contrib", gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"), repel = TRUE) fviz_pca_biplot(res.pca, repel = TRUE, col.var = "#2E9FDF", col.ind = "#696969") ###Access to the PCA results: eig.val <- get_eigenvalue(res.pca) eig.val
Results of Principal component Analysis
The scree plot of the data set is as following,
Eigen values and Factors
##Eigen values eigenvalue variance.percent cumulative.variance.percent Dim.1 4.1242133 41.242133 41.24213 Dim.2 1.8385309 18.385309 59.62744 Dim.3 1.2391403 12.391403 72.01885 Dim.4 0.8194402 8.194402 80.21325 Dim.5 0.7015528 7.015528 87.22878 Dim.6 0.4228828 4.228828 91.45760 Dim.7 0.3025817 3.025817 94.48342 Dim.8 0.2744700 2.744700 97.22812 Dim.9 0.1552169 1.552169 98.78029 Dim.10 0.1219710 1.219710 100.00000 res.ind <- get_pca_ind(res.pca) res.ind$coord res.ind$contrib res.ind$cos2 ### Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 SEBRLE 0.1912074 -1.5541282 -0.62836882 0.08205241 1.1426139415 -0.46389755 CLAY 0.7901217 -2.4204156 1.35688701 1.26984296 -0.8068483724 1.30420016 BERNARD -1.3292592 -1.6118687 -0.19614996 -1.92092203 0.0823428202 -0.40062867 YURKOV -0.8694134 0.4328779 -2.47398223 0.69723814 0.3988584116 0.10286344 ZSIVOCZKY -0.1057450 2.0233632 1.30493117 -0.09929630 -0.1970241089 0.89554111 McMULLEN 0.1185550 0.9916237 0.84355824 1.31215266 1.5858708644 0.18657283 MARTINEAU -2.3923532 1.2849234 -0.89816842 0.37309771 -2.2433515889 -0.45666350 HERNU -1.8910497 -1.1784614 -0.15641037 0.89130068 -0.1267412520 0.43623496 BARRAS -1.7744575 0.4125321 0.65817750 0.22872866 -0.2338366980 0.09026010 NOOL -2.7770058 1.5726757 0.60724821 -1.55548081 1.4241839810 0.49716399 BOURGUIGNON -4.4137335 -1.2635770 -0.01003734 0.66675478 0.4191518468 -0.08200220 Sebrle 3.4514485 -1.2169193 -1.67816711 -0.80870696 -0.0250530746 -0.08279306 Clay 3.3162243 -1.6232908 -0.61840443 -0.31679906 0.5691645854 0.77715960 Karpov 4.0703560 0.7983510 1.01501662 0.31336354 -0.7974259553 -0.32958134 Macey 1.8484623 2.0638828 -0.97928455 0.58469073 -0.0002157834 -0.19728082 Warners 1.3873514 -0.2819083 1.99969621 -1.01959817 -0.0405401497 -0.55673300 Zsivoczky 0.4715533 0.9267436 -1.72815525 -0.18483138 0.4073029909 -0.11383190 Hernu 0.2763118 1.1657260 0.17056375 -0.84869401 -0.6894795441 -0.33168404 Bernard 1.3672590 1.4780354 0.83137913 0.74531557 0.8598016482 -0.32806564 Schwarzl -0.7102777 -0.6584251 1.04075176 -0.92717510 -0.2887568007 -0.68891640 Pogorelov -0.2143524 -0.8610557 0.29761010 1.35560294 -0.0150531057 -1.59379599 Schoenbeck -0.4953166 -1.3000530 0.10300360 -0.24927712 -0.6452257128 0.16172381 Barras -0.3158867 0.8193681 -0.86169481 -0.58935985 -0.7797389436 1.17415412 Dim.7 Dim.8 Dim.9 Dim.10 SEBRLE -0.20796012 0.043460568 -0.659344137 0.03273238 CLAY -0.21291866 0.617240611 -0.060125359 -0.31716015 BERNARD -0.40643754 0.703856040 0.170083313 -0.09908142 YURKOV -0.32487448 0.114996135 -0.109524039 -0.11969720 ZSIVOCZKY 0.08825624 -0.202341299 -0.523103099 -0.34842265 McMULLEN 0.47828432 0.293089967 -0.105623196 -0.39317797 MARTINEAU -0.29975522 -0.291628488 -0.223417655 -0.61640509 HERNU -0.56609980 -1.529404317 0.006184409 0.55368016 BARRAS 0.21594095 0.682583078 -0.669282042 0.53085420 NOOL -0.53205687 -0.433385655 -0.115777808 -0.09622142 BOURGUIGNON -0.59833739 0.563619921 0.525814030 0.05855882 Sebrle 0.01016177 -0.030585843 -0.847210682 0.21970353 Clay 0.25750851 -0.580638301 0.409776590 -0.61601933 Karpov -1.36365568 0.345306381 0.193055107 0.21721852 Macey -0.26927772 -0.363219506 0.368260269 0.21249474 Warners -0.26739400 -0.109470797 0.180283071 0.24208420 Zsivoczky 0.03991159 0.538039776 0.585966156 -0.14271715 Hernu 0.44308686 0.247293566 0.066908586 -0.20868256 Bernard 0.36357920 0.006165316 0.279488675 0.32067773 Schwarzl 0.56568604 -0.687053339 -0.008358849 -0.30211546 Pogorelov 0.78370119 -0.037623661 -0.130531397 -0.03697576 Schoenbeck 0.85752368 -0.255850722 0.564222295 0.29680481 Barras 0.94512710 0.365550568 0.102255763 0.61186706 > res.ind$contrib Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 SEBRLE 0.03854254 5.7118249 1.385418e+00 0.03572215 8.091161e+00 2.21256620 CLAY 0.65814114 13.8541889 6.460097e+00 8.55568792 4.034555e+00 17.48801877 BERNARD 1.86273218 6.1441319 1.349983e-01 19.57827284 4.202070e-02 1.65019840 YURKOV 0.79686310 0.4431309 2.147558e+01 2.57939100 9.859373e-01 0.10878629 ZSIVOCZKY 0.01178829 9.6816398 5.974848e+00 0.05231437 2.405750e-01 8.24561722 McMULLEN 0.01481737 2.3253860 2.496789e+00 9.13531719 1.558646e+01 0.35788945 MARTINEAU 6.03367104 3.9044125 2.830527e+00 0.73858431 3.118936e+01 2.14409841 HERNU 3.76996156 3.2842176 8.583863e-02 4.21505626 9.955149e-02 1.95655942 BARRAS 3.31942012 0.4024544 1.519980e+00 0.27758505 3.388731e-01 0.08376135 NOOL 8.12988880 5.8489726 1.293851e+00 12.83761115 1.257025e+01 2.54127369 BOURGUIGNON 20.53729577 3.7757623 3.534995e-04 2.35877858 1.088816e+00 0.06913582 Sebrle 12.55838616 3.5020697 9.881482e+00 3.47006223 3.889859e-03 0.07047579 Clay 11.59361384 6.2315181 1.341828e+00 0.53250375 2.007648e+00 6.20972751 Karpov 17.46609555 1.5072627 3.614914e+00 0.52101693 3.940874e+00 1.11680500 Macey 3.60207087 10.0732890 3.364879e+00 1.81387486 2.885677e-07 0.40014909 Warners 2.02910262 0.1879390 1.403071e+01 5.51585696 1.018550e-02 3.18673563 Zsivoczky 0.23441891 2.0310492 1.047894e+01 0.18126182 1.028128e+00 0.13322327 Hernu 0.08048777 3.2136178 1.020764e-01 3.82170515 2.946148e+00 1.13110069 Bernard 1.97075488 5.1661961 2.425213e+00 2.94737426 4.581507e+00 1.10655655 Schwarzl 0.53184785 1.0252129 3.800546e+00 4.56119277 5.167449e-01 4.87961053 Pogorelov 0.04843819 1.7533304 3.107757e-01 9.75034337 1.404313e-03 26.11665608 Schoenbeck 0.25864068 3.9969003 3.722687e-02 0.32970059 2.580092e+00 0.26890572 Barras 0.10519467 1.5876667 2.605305e+00 1.84296038 3.767994e+00 14.17432302 Dim.7 Dim.8 Dim.9 Dim.10 SEBRLE 0.621426384 2.992045e-02 12.177477305 0.03819185 CLAY 0.651413899 6.035125e+00 0.101262442 3.58568943 BERNARD 2.373652810 7.847747e+00 0.810319793 0.34994507 YURKOV 1.516564073 2.094806e-01 0.336009790 0.51072064 ZSIVOCZKY 0.111923276 6.485544e-01 7.664919832 4.32741147 McMULLEN 3.287016354 1.360753e+00 0.312501167 5.51053518 MARTINEAU 1.291109482 1.347216e+00 1.398195851 13.54402896 HERNU 4.604850849 3.705288e+01 0.001071345 10.92781554 BARRAS 0.670038259 7.380544e+00 12.547331617 10.04537028 NOOL 4.067669683 2.975270e+00 0.375477289 0.33003418 BOURGUIGNON 5.144247534 5.032108e+00 7.744571086 0.12223626 Sebrle 0.001483775 1.481898e-02 20.105546253 1.72063803 Clay 0.952824148 5.340583e+00 4.703566841 13.52708188 Karpov 26.720158115 1.888802e+00 1.043988269 1.68193477 Macey 1.041910483 2.089853e+00 3.798767930 1.60957713 Warners 1.027384225 1.898339e-01 0.910422384 2.08904756 Zsivoczky 0.022889042 4.585705e+00 9.617852173 0.72605208 Hernu 2.821027418 9.687304e-01 0.125399768 1.55234328 Bernard 1.899449022 6.021268e-04 2.188071254 3.66566729 Schwarzl 4.598122119 7.477531e+00 0.001957159 3.25357879 Pogorelov 8.825322559 2.242329e-02 0.477268755 0.04873597 Schoenbeck 10.566272800 1.036933e+00 8.917302863 3.14020004 Barras 12.835417603 2.116763e+00 0.292892746 13.34533825 > res.ind$cos2 Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 SEBRLE 0.007530179 0.49747323 8.132523e-02 0.001386688 2.689027e-01 0.0443241299 CLAY 0.048701249 0.45701660 1.436281e-01 0.125791741 5.078506e-02 0.1326907339 BERNARD 0.197199804 0.28996555 4.294015e-03 0.411819183 7.567259e-04 0.0179131165 YURKOV 0.096109800 0.02382571 7.782303e-01 0.061812637 2.022798e-02 0.0013453555 ZSIVOCZKY 0.001574385 0.57641944 2.397542e-01 0.001388216 5.465497e-03 0.1129176906 McMULLEN 0.002175437 0.15219499 1.101379e-01 0.266486530 3.892621e-01 0.0053876990 MARTINEAU 0.404013915 0.11654676 5.694575e-02 0.009826320 3.552552e-01 0.0147210347 HERNU 0.399282749 0.15506199 2.731529e-03 0.088699901 1.793538e-03 0.0212478795 BARRAS 0.616241975 0.03330700 8.478249e-02 0.010239088 1.070152e-02 0.0015944528 NOOL 0.489872515 0.15711146 2.342405e-02 0.153694675 1.288433e-01 0.0157010551 BOURGUIGNON 0.859698130 0.07045912 4.446015e-06 0.019618511 7.753120e-03 0.0002967459 Sebrle 0.675380606 0.08395940 1.596674e-01 0.037079012 3.558507e-05 0.0003886276 Clay 0.687592867 0.16475409 2.391051e-02 0.006274965 2.025440e-02 0.0377627839 Karpov 0.783666922 0.03014772 4.873187e-02 0.004644764 3.007790e-02 0.0051379747 Macey 0.363436037 0.45308203 1.020057e-01 0.036362957 4.952707e-09 0.0041397727 Warners 0.255651956 0.01055582 5.311341e-01 0.138081100 2.182965e-04 0.0411689767 Zsivoczky 0.045053176 0.17401353 6.051030e-01 0.006921739 3.361236e-02 0.0026253777 Hernu 0.024824321 0.44184663 9.459148e-03 0.234196727 1.545686e-01 0.0357707217 Bernard 0.289347476 0.33813318 1.069834e-01 0.085980212 1.144234e-01 0.0166586433 Schwarzl 0.116721435 0.10030142 2.506043e-01 0.198892209 1.929118e-02 0.1098063093 Pogorelov 0.007803472 0.12591966 1.504272e-02 0.312101619 3.848427e-05 0.4314162233 Schoenbeck 0.067070098 0.46204603 2.900467e-03 0.016987442 1.138116e-01 0.0071500829 Barras 0.018972684 0.12765099 1.411800e-01 0.066043061 1.156018e-01 0.2621297474 Dim.7 Dim.8 Dim.9 Dim.10 SEBRLE 8.907507e-03 3.890334e-04 8.954067e-02 0.0002206741 CLAY 3.536548e-03 2.972084e-02 2.820119e-04 0.0078471026 BERNARD 1.843634e-02 5.529104e-02 3.228572e-03 0.0010956493 YURKOV 1.341980e-02 1.681440e-03 1.525225e-03 0.0018217256 ZSIVOCZKY 1.096685e-03 5.764478e-03 3.852703e-02 0.0170924251 McMULLEN 3.540616e-02 1.329562e-02 1.726733e-03 0.0239268142 MARTINEAU 6.342774e-03 6.003515e-03 3.523552e-03 0.0268211980 HERNU 3.578167e-02 2.611676e-01 4.270425e-06 0.0342288717 BARRAS 9.126203e-03 9.118662e-02 8.766746e-02 0.0551531863 NOOL 1.798232e-02 1.193105e-02 8.514912e-04 0.0005881295 BOURGUIGNON 1.579887e-02 1.401866e-02 1.220108e-02 0.0001513277 Sebrle 5.854423e-06 5.303795e-05 4.069384e-02 0.0027366539 Clay 4.145976e-03 2.107924e-02 1.049876e-02 0.0237264222 Karpov 8.795817e-02 5.639959e-03 1.762907e-03 0.0022318265 Macey 7.712721e-03 1.403282e-02 1.442502e-02 0.0048028954 Warners 9.496848e-03 1.591742e-03 4.317040e-03 0.0077841113 Zsivoczky 3.227467e-04 5.865332e-02 6.956790e-02 0.0041268259 Hernu 6.383462e-02 1.988402e-02 1.455601e-03 0.0141595965 Bernard 2.046050e-02 5.883405e-06 1.209056e-02 0.0159167991 Schwarzl 7.403638e-02 1.092132e-01 1.616543e-05 0.0211173850 Pogorelov 1.043115e-01 2.404103e-04 2.893750e-03 0.0002322016 Schoenbeck 2.010275e-01 1.789520e-02 8.702893e-02 0.0240826922 Barras 1.698426e-01 2.540745e-02 1.988116e-03 0.0711836486
Plots of PCA
Impotance of PCA
PCA assists in the interpretation of data, however, it does not always identify the most relevant patterns. PCA is a technique for reducing the complexity of high-dimensional data while preserving trends and patterns. It accomplishes this by condensing the data into fewer dimensions that serve as feature summaries. In biology, high-dimensional data arises when several features, such as the expression of many genes, are assessed for each sample. When testing each feature for association with an outcome, this type of data offers three issues that PCA mitigates: computational expense and an increased error rate owing to multiple test corrections. PCA is a type of unsupervised learning that is similar to clustering in that it discovers patterns without considering whether the data are from distinct treatment groups or have phenotypic variations.PCA lowers data by geometrically projecting it onto smaller dimensions known as principal components (PCs), with the objective of obtaining the best summary of the data with the fewest possible PCs.
Drawbacks of Principal component analysis
- A PCA doesn’t always work in the sense that a large number of original variables are reduced to a smaller number of transformed variables. If the original variables are uncorrelated then the analysis does absolutely nothing.
- PCA is not based on any particular statistical model.
- PCA doesn’t separate error terms from the system part.
Learn data analysis using SPSS
The post Principal Component Analysis (PCA) using R appeared first on Statistical Aid: A School of Statistics.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.