Determining the Number of Factors with Parallel Analysis in R
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Tom Schmitt
April 12, 2016
As discussed on page 308 and illustrated on page 312 of Schmitt (2011), a first essential step in Factor Analysis is to determine the appropriate number of factors with Parallel Analysis in R. The data consists of 26 psychological tests administered by Holzinger and Swineford (1939) to 145 students and has been used by numerous authors to demonstrate the effectiveness of Factor Analysis. Only 8 tests are used here and hypothesized to be formed by 2 constructs: a visual construct consisting of visual perception, cubes, paper form board, and flags, and verbal construct consisting of general information, paragraph comprehension, sentence completion, and word classification. Below I will go through the code in R for parallel analysis.
First, we need to load the necessary packages:
install.packages("paran") library(relimp, pos = 4) library(paran)
Once the packages are loaded we can run our Parallel Analysis in R code. We first import our data and make sure it looks okay:
# Imports data called grantwhite with tab spaces and variable names. grantwhite <- read.table("C:/holzraw.dat", header = FALSE, sep = "", na.strings = "NA", dec = ".", strip.white = TRUE) # grantwhite #Prints all the data if you want, but I have commented it out. tail(grantwhite, n = 5) #Only prints last 5 subjects.
The first several cases look good!
V1 V2 V3 V4 V5 V6 V7 V8 141 24 28 18 11 49 8 17 27 142 18 24 14 13 31 7 16 23 143 28 22 16 15 55 11 23 32 144 26 27 14 4 48 11 18 33 145 26 24 16 27 51 11 23 39
We can now run the Parallel Analysis in R using Dino’s paran package. I won’t go through the specifics of the Parallel Analysis code, but most of it is just producing and formatting the Scree Plot, so it is not as complicated as it looks.
# Parallel Analysis with Dino's 'paran' package. #Note, that grantwhite[c(1:8)] selects variables 1-8 paran(grantwhite[c(1:8)], iterations = 5000, centile = 0, quietly = FALSE, status = TRUE, all = TRUE, cfa = TRUE, graph = TRUE, color = TRUE, col = c("black", "red", "blue"), lty = c(1, 2, 3), lwd = 1, legend = TRUE, file = "", width = 640, height = 640, grdevice = "png", seed = 0)
The Parallel Analysis in R results look good and are close to those found on page 312, supporting the hypothesized visual and verbal constructs.
Using eigendecomposition of correlation matrix. Computing: 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Results of Horn's Parallel Analysis for factor retention 5000 iterations, using the mean estimate -------------------------------------------------- Factor Adjusted Unadjusted Estimated Eigenvalue Eigenvalue Bias -------------------------------------------------- No components passed. -------------------------------------------------- 1 2.762726 3.187213 0.424486 2 0.364169 0.639895 0.275725 3 -0.065553 0.098102 0.163655 4 -0.085368 -0.01673 0.068631 5 -0.070092 -0.08559 -0.01550 6 -0.043998 -0.13940 -0.09540 7 0.014144 -0.15825 -0.17240 8 0.066576 -0.19364 -0.26022 --------------------------------------------------
Adjusted eigenvalues > 0 indicate dimensions to retain. (2 factors retained)
And check-out the easy to interpret Parallel Analysis in R Scree Plot with the adjusted eigenvalues (unretained) giving a nice visual representation of the two-factor solution. No need to make any subjective decisions with this method!
The post Determining the Number of Factors with Parallel Analysis in R appeared first on Equastat.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.