Determining the Number of Factors with Parallel Analysis in R

April 12, 2016
By

(This article was first published on R – Equastat, and kindly contributed to R-bloggers)






Tom Schmitt



April 12, 2016

As discussed on page 308 and illustrated on page 312 of Schmitt (2011), a first essential step in Factor Analysis is to determine the appropriate number of factors with Parallel Analysis in R. The data consists of 26 psychological tests administered by Holzinger and Swineford (1939) to 145 students and has been used by numerous authors to demonstrate the effectiveness of Factor Analysis. Only 8 tests are used here and hypothesized to be formed by 2 constructs: a visual construct consisting of visual perception, cubes, paper form board, and flags, and verbal construct consisting of general information, paragraph comprehension, sentence completion, and word classification. Below I will go through the code in R for parallel analysis.

First, we need to load the necessary packages:

install.packages("paran")
library(relimp, pos = 4)
library(paran)

Once the packages are loaded we can run our Parallel Analysis in R code. We first import our data and make sure it looks okay:

# Imports data called grantwhite with tab spaces and variable names.
grantwhite <- read.table("C:/holzraw.dat", 
    header = FALSE, sep = "", na.strings = "NA", dec = ".", strip.white = TRUE)
# grantwhite #Prints all the data if you want, but I have commented it out.
tail(grantwhite, n = 5)  #Only prints last 5 subjects.

The first several cases look good!

    V1 V2 V3 V4 V5 V6 V7 V8
141 24 28 18 11 49  8 17 27
142 18 24 14 13 31  7 16 23
143 28 22 16 15 55 11 23 32
144 26 27 14  4 48 11 18 33
145 26 24 16 27 51 11 23 39

We can now run the Parallel Analysis in R using Dino’s paran package. I won’t go through the specifics of the Parallel Analysis code, but most of it is just producing and formatting the Scree Plot, so it is not as complicated as it looks.

# Parallel Analysis with Dino's 'paran' package. 
#Note, that grantwhite[c(1:8)] selects variables 1-8
paran(grantwhite[c(1:8)], iterations = 5000, centile = 0, quietly = FALSE, 
    status = TRUE, all = TRUE, cfa = TRUE, graph = TRUE, color = TRUE, 
    col = c("black", "red", "blue"), lty = c(1, 2, 3), lwd = 1, legend = TRUE, 
    file = "", width = 640, height = 640, grdevice = "png", seed = 0)

The Parallel Analysis in R results look good and are close to those found on page 312, supporting the hypothesized visual and verbal constructs.

Using eigendecomposition of correlation matrix.
Computing: 10%  20%  30%  40%  50%  60%  70%  80%  90%  100%


Results of Horn's Parallel Analysis for factor retention
5000 iterations, using the mean estimate

-------------------------------------------------- 
Factor      Adjusted    Unadjusted    Estimated 
            Eigenvalue  Eigenvalue    Bias 
-------------------------------------------------- 
No components passed. 
-------------------------------------------------- 
1           2.762726    3.187213      0.424486
2           0.364169    0.639895      0.275725
3          -0.065553    0.098102      0.163655
4          -0.085368   -0.01673       0.068631
5          -0.070092   -0.08559      -0.01550
6          -0.043998   -0.13940      -0.09540
7           0.014144   -0.15825      -0.17240
8           0.066576   -0.19364      -0.26022
-------------------------------------------------- 

Adjusted eigenvalues > 0 indicate dimensions to retain. (2 factors retained)

And check-out the easy to interpret Parallel Analysis in R Scree Plot with the adjusted eigenvalues (unretained) giving a nice visual representation of the two-factor solution. No need to make any subjective decisions with this method! Parallel Analysis in R showing Scree Plot



The post Determining the Number of Factors with Parallel Analysis in R appeared first on Equastat.

To leave a comment for the author, please follow the link and comment on their blog: R – Equastat.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)