A quicky..

February 22, 2010
By

(This article was first published on Stats raving mad » R, and kindly contributed to R-bloggers)

If you’re (and you should) interested in principal components then take a good look at this. The linked post will take you by hand to do everything from scratch. If you’re not in the mood then the dollowing R functions will help you.

An example.

# Generates sample matrix of five discrete clusters that have
# very different mean and standard deviation values.
z1 <- rnorm(10000, mean=1, sd=1);
z2 <- rnorm(10000, mean=3, sd=3);
z3 <- rnorm(10000, mean=5, sd=5);
z4 <- rnorm(10000, mean=7, sd=7);
z5 <- rnorm(10000, mean=9, sd=9);
mydata <- matrix(c(z1, z2, z3, z4, z5), 2500, 20, byrow=T,
dimnames=list(paste("R", 1:2500, sep=""), paste("C", 1:20, sep="")))

# Performs principal component analysis after scaling the data.
# It returns a list with class "prcomp" that contains five components:
#   (1) the standard deviations (sdev) of the principal components,
#   (2) the matrix of eigenvectors (rotation),
#   (3) the principal component data (x),
#   (4) the centering (center) and
#   (5) scaling (scale) used.
pca <- prcomp(mydata, scale=T)

# Prints variance summary for all principal components.
summary(pca)

# Set plotting parameters.
x11(height=6, width=12, pointsize=12); par(mfrow=c(1,2))

# Define plotting colors.
mycolors <- c("red", "green", "blue", "magenta", "black")

# Plots scatter plot for the first two principal components
# that are stored in pca$x[,1:2].
plot(pca$x, pch=20, col=mycolors[sort(rep(1:5, 500))])
# Same as above, but prints labels.
plot(pca$x, type="n"); text(pca$x, rownames(pca$x), cex=0.8,
 col=mycolors[sort(rep(1:5, 500))])

# Plots scatter plots for all combinations between the first four principal components.
pairs(pca$x[,1:4], pch=20, col=mycolors[sort(rep(1:5, 500))])

# Plots a scatter plot for the first two principal components
# plus the corresponding eigen vectors that are stored in pca$rotation.
biplot(pca)

# Loads library scatterplot3d.
library(scatterplot3d)
# Same as above, but plots the first three principal components in 3D scatter plot
scatterplot3d(pca$x[,1:3], pch=20, color=mycolors[sort(rep(1:5, 500))])

# Importance of components:
#                          PC1    PC2    PC3    PC4    PC5    PC6    PC7    PC8
# Standard deviation     2.157 0.9953 0.9831 0.9684 0.9601 0.9465 0.9340 0.9288
# Proportion of Variance 0.233 0.0495 0.0483 0.0469 0.0461 0.0448 0.0436 0.0431
# Cumulative Proportion  0.233 0.2822 0.3305 0.3774 0.4235 0.4683 0.5119 0.5550
#                           PC9   PC10   PC11   PC12   PC13   PC14   PC15   PC16
# Standard deviation     0.9030 0.8989 0.8930 0.8763 0.8703 0.8656 0.8573 0.8458
# Proportion of Variance 0.0408 0.0404 0.0399 0.0384 0.0379 0.0375 0.0367 0.0358
# Cumulative Proportion  0.5958 0.6362 0.6761 0.7145 0.7523 0.7898 0.8265 0.8623
#                          PC17   PC18   PC19   PC20
# Standard deviation     0.8415 0.8360 0.8302 0.8110
# Proportion of Variance 0.0354 0.0349 0.0345 0.0329
# Cumulative Proportion  0.8977 0.9326 0.9671 1.0000
# KernSmooth 2.23 loaded
# Copyright M. P. Wand 1997-2009

To leave a comment for the author, please follow the link and comment on his blog: Stats raving mad » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , ,

Comments are closed.