All together now – Confirmatory Factor Analysis in R

[This article was first published on Sustainable Research » Renglish, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Describing multivariate data is not easy. Especially, if you think that statisticians have not developed any new tools after the ANOVA and principal component analysis (PCA). For social and experimental scientists the most important new technique are structural equation models that combine measurement models (that substitute reliability analysis and PCA) and structural models (that substitute ANOVAs or regressions).

At present three R-packages provide the functionality to extimate structural equation models.

  • sem: The first package to provide the ability to fit structural equation models in R.
  • OpenMX: Has a large number of active developers, draws up-on a well established code to fit the models (Mx) and can fit non-standard models, and is the first to announce version 1.0.
  • lavaan: Aims at a very easy-to-use implementation of SEM that also incorporates advanced techniques (e.g. Full Information Maximum Likelihood Estimation, and multiple-group confirmatory factor analysis).

Today we focus on using structural equation models to fit a measurement model that specifies which items load on which factor. This is similar to what some do with principal component analysis or exploratory factor analysis. If  you already know how the items form the factors you should use CFA, because this gives you several measures of fit and lets you Another advantage is that the SEM-framework provides a framework in which questions of differences between groups can be asked at various levels.

Using lavaan a simple model with two latent variables, each measured with four items, can be fit with the following lines of code.

?Download cfa.R
model <- '
# latent variable definitions
factor_1 =~ y1 + y2 + y3 + y4
factor_2 =~ y5 + y6 + y7 + y8
# covariance between factor_1 and factor_2
factor_1 ~~ factor_2
# residual covariances
y1 ~~ y5
fit <- cfa(model, data=ex_data)
The output you get contains all the fit-indeces you love (RMSEA, GFI, CFI…). And as a bonus lavaan has a dedicated function that lets you run a multiple-group confirmatory factor analysis to test for measurement invariance. Something that took me a while in AMOS.
?Download cfa.R
measurement.invariance(model, data=ex_data, group ="school" )
  • lavaan is currently at version 0.3, so one should check it against other programmes.

To leave a comment for the author, please follow the link and comment on their blog: Sustainable Research » Renglish. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.