A quicky..

February 22, 2010
By

(This article was first published on Stats raving mad » R, and kindly contributed to R-bloggers)

If you’re (and you should) interested in principal components then take a good look at this. The linked post will take you by hand to do everything from scratch. If you’re not in the mood then the dollowing R functions will help you.

An example.

# Generates sample matrix of five discrete clusters that have
# very different mean and standard deviation values.
z1 <- rnorm(10000, mean=1, sd=1);
z2 <- rnorm(10000, mean=3, sd=3);
z3 <- rnorm(10000, mean=5, sd=5);
z4 <- rnorm(10000, mean=7, sd=7);
z5 <- rnorm(10000, mean=9, sd=9);
mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T,
dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep="")))

# Performs principal component analysis after scaling the data.
# It returns a list with class "prcomp" that contains five components:
#   (1) the standard deviations (sdev) of the principal components,
#   (2) the matrix of eigenvectors (rotation),
#   (3) the principal component data (x),
#   (4) the centering (center) and
#   (5) scaling (scale) used.
pca <- prcomp(mydata, scale=T)

# Prints variance summary for all principal components.
summary(pca)

# Set plotting parameters.
x11(height=6, width=12, pointsize=12); par(mfrow=c(1,2))

# Define plotting colors.
mycolors <- c("red", "green", "blue", "magenta", "black")

# Plots scatter plot for the first two principal components
# that are stored in pca$x[,1:2].
plot(pca$x, pch=20, col=mycolors[sort(rep(1:5, 500))])
# Same as above, but prints labels.
plot(pca$x, type="n"); text(pca$x, rownames(pca$x), cex=0.8,
 col=mycolors[sort(rep(1:5, 500))])

# Plots scatter plots for all combinations between the first four principal components.
pairs(pca$x[,1:4], pch=20, col=mycolors[sort(rep(1:5, 500))])

# Plots a scatter plot for the first two principal components
# plus the corresponding eigen vectors that are stored in pca$rotation.
biplot(pca)

# Loads library scatterplot3d.
library(scatterplot3d)
# Same as above, but plots the first three principal components in 3D scatter plot
scatterplot3d(pca$x[,1:3], pch=20, color=mycolors[sort(rep(1:5, 500))])

# Importance of components:
#                          PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8
# Standard deviation     2.157 0.9953 0.9831 0.9684 0.9601 0.9465 0.9340 0.9288
# Proportion of Variance 0.233 0.0495 0.0483 0.0469 0.0461 0.0448 0.0436 0.0431
# Cumulative Proportion  0.233 0.2822 0.3305 0.3774 0.4235 0.4683 0.5119 0.5550
#                           PC9   PC10   PC11   PC12   PC13   PC14   PC15   PC16
# Standard deviation     0.9030 0.8989 0.8930 0.8763 0.8703 0.8656 0.8573 0.8458
# Proportion of Variance 0.0408 0.0404 0.0399 0.0384 0.0379 0.0375 0.0367 0.0358
# Cumulative Proportion  0.5958 0.6362 0.6761 0.7145 0.7523 0.7898 0.8265 0.8623
#                          PC17   PC18   PC19   PC20
# Standard deviation     0.8415 0.8360 0.8302 0.8110
# Proportion of Variance 0.0354 0.0349 0.0345 0.0329
# Cumulative Proportion  0.8977 0.9326 0.9671 1.0000
# KernSmooth 2.23 loaded
# Copyright M. P. Wand 1997-2009

To leave a comment for the author, please follow the link and comment on their blog: Stats raving mad » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , ,

Comments are closed.

Sponsors

Mango solutions



plotly webpage

dominolab webpage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)