How Many Factors to Retain in Factor Analysis

[This article was first published on Dominique Makowski, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The method agreement procedure

When running a factor analysis, one often needs to know how many components / latent variables to retain. Fortunately, many methods exist to statistically answer this question. Unfortunately, there is no consensus on which method to use. Therefore, the n_factors() function, available in the psycho package, performs the method agreement procedure: it runs all the routines and returns the number of factors with the highest consensus.

<span class="c1"># devtools::install_github("neuropsychology/psycho.R")  # Install the last psycho version if needed</span><span class="w">

</span><span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">psycho</span><span class="p">)</span><span class="w">

</span><span class="n">results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">attitude</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">psycho</span><span class="o">::</span><span class="n">n_factors</span><span class="p">()</span><span class="w">

</span><span class="n">print</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="w">
</span>
## The choice of 1 factor is supported by 5 (out of 9; 55.56%) methods (Optimal Coordinates, Acceleration Factor, Parallel Analysis, Velicer MAP, VSS Complexity 1).

We can have an overview of all values by using the summary method.

n.Factors n.Methods Eigenvalues Cum.Variance
1 5 3.72 0.53
2 3 1.14 0.69
3 1 0.85 0.81
4 0 0.61 0.90
5 0 0.32 0.95
6 0 0.22 0.98
7 0 0.14 1.00

And, of course, plot it 🙂

<span class="n">plot</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="w">
</span>

The plot shows the number of methods (in yellow), the Eigenvalues (red line) and the cumulative proportion of explained variance (blue line).

For more details, we can also extract the final result (the optimal number of factors) for each method:

Method n_optimal
Optimal Coordinates 1
Acceleration Factor 1
Parallel Analysis 1
Eigenvalues (Kaiser Criterion) 2
Velicer MAP 1
BIC 2
Sample Size Adjusted BIC 3
VSS Complexity 1 1
VSS Complexity 2 2

Tweaking

We can also provide a correlation matrix, as well as changing the rotation and the factoring method.

<span class="n">df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">psycho</span><span class="o">::</span><span class="n">affective</span><span class="w">

</span><span class="n">cor_mat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">psycho</span><span class="o">::</span><span class="n">correlation</span><span class="p">(</span><span class="n">df</span><span class="p">)</span><span class="w">
</span><span class="n">cor_mat</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cor_mat</span><span class="o">$</span><span class="n">values</span><span class="o">$</span><span class="n">r</span><span class="w">

</span><span class="n">results</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">cor_mat</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">psycho</span><span class="o">::</span><span class="n">n_factors</span><span class="p">(</span><span class="n">rotate</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"oblimin"</span><span class="p">,</span><span class="w"> </span><span class="n">fm</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"mle"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="o">=</span><span class="n">nrow</span><span class="p">(</span><span class="n">df</span><span class="p">))</span><span class="w">

</span><span class="n">print</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="w">
</span>
## The choice of 2 factors is supported by 5 (out of 9; 55.56%) methods (Parallel Analysis, Eigenvalues (Kaiser Criterion), BIC, Sample Size Adjusted BIC, VSS Complexity 2).
<span class="n">plot</span><span class="p">(</span><span class="n">results</span><span class="p">)</span><span class="w">
</span>

Credits

This package helped you? Don’t forget to cite the various packages you used 🙂

You can cite psycho as follows:

  • Makowski, (2018). The psycho Package: an Efficient and Publishing-Oriented Workflow for Psychological Science. Journal of Open Source Software, 3(22), 470. https://doi.org/10.21105/joss.00470

To leave a comment for the author, please follow the link and comment on their blog: Dominique Makowski.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)