Structural Equation Modelling (SEM) Software is frequently used in psychology. This post discusses the exciting prospect of greater support for SEM in R. …
- Run confirmatory factor analyses to examine the measurement structure of multi-factor psychological scales
- Compare the factor structure of a scale across multiple groups
- Examine the plausibility of various structural and mediation models. It’s particularly useful when the mediation is more complex than the standard three variable scenario.
- Estimate correlations or regression models of the latent variables (i.e., adjusting for reliability).
- Determine parsimonious descriptions of a correlation matrix by exploring the fit that results from placing and removing various equality constraints.
A little history: I was originally taught to do Structural Equation modelling in Amos (which was bought out by SPSS, which was bought out by IBM). Among other things Amos attempted to bring SEM to the masses. The main mode of creating models in Amos is to draw them graphically. This makes it fairly easy to draw simple confirmatory factor analysis models and simple structural models. There are also many drawing tools designed to make it more efficient to draw diagrams and so on. However, the switch from simple drawing of models to testing models programmatically is a big jump, especially considering that you have to learn a programming language for something you might only use occasionally. And even drawing a single model can eventually become quite time consuming and error prone. For this and several other reasons I have been excited about the idea of running structural equation models in R. A selection of reasons why R would be a natural fit for structural equation modelling include the following:
Model comparison: Bad SEM style involves a researcher saying this is my model and testing only that model and ticking the Hu and Bentller fit statistics boxes. Good SEM style typically involves adopting a model comparison approach. A series of models are specified: e.g., baseline simple model, an hypothesised model, a series of plausible alternative models, and one or models models based on post-hoc theoretically justifiable refinements. R is well-suited to such a model comparison approach. Each model can be stored in a list. Fit statistics can be extracted using code. Tables for comparing models in terms of fit and nested chi-squares can easily be obtained.
Specification of models in R: The challenge is to provide a way of specifying models that is easy and efficient. It should then be easy to additionally adjust models by for example specifying equality constraints, constraint relationships to zero and so on.
Extracting model information in R: SEM produces a lot of output. This is well suited to R where this information can be stored in a list structure. This information can then be selectively extracted as needed.
Writing code for SEM and R: SEM tends to be a niche statistical task. I might use it 3 or 4 times per year. Thus, learning a whole new scripting environment is annoying. Using the same programming language as R makes a lot of sense. Scripts can more easily be shared to highlight common analyses, and those with more knowledge of SEM can lead the way in how to program more advanced models.
Graphically representing models in R: R is great for graphics. It would be great to be able to specify an SEM model and simply run a plot function to graphically represent it with options for what information is represented and how it is presented.
Implementation of various preparatory processes in R: R should make it easier to do various common preparatory activities, such as item parcelling, calculating alternatively estimates of correlations (e.g., polychoric correlations, etc.). The beauty of this is that the analysts could quickly examine the effect of tweaking various initial conditions on the final results.
Incremental improvement: SEM practice is constantly evolving. R programs typically adopt a modular orientation that allow for incorporation of additional procedures. E.g. new fit measures, new estimation algorithms, and so on.
The sem package: John Fox wrote the sem package. It’s an excellent package. It provides a means for running structural equation models in R. There’s less handholding than with Amos. And specifying models efficiently takes some getting used to. It also does not have all the fit statistics and features of some of the bigger commercial packages. There’s further discussion on a psychology wiki. I list some additional links here. In short, the sem package is awesome for what it can do. However, it wont yet replace the bigger commercial packages.