# Structural Equation Modelling in R

September 20, 2009
By

(This article was first published on Jeromy Anglim's Blog: Psychology and Statistics, and kindly contributed to R-bloggers)

Structural Equation Modelling (SEM) Software is frequently used in psychology. This post discusses the exciting prospect of greater support for SEM in R. …

I have used SEM to:

• Run confirmatory factor analyses to examine the measurement structure of multi-factor psychological scales
• Compare the factor structure of a scale across multiple groups
• Examine the plausibility of various structural and mediation models. It’s particularly useful when the mediation is more complex than the standard three variable scenario.
• Estimate correlations or regression models of the latent variables (i.e., adjusting for reliability).
• Determine parsimonious descriptions of a correlation matrix by exploring the fit that results from placing and removing various equality constraints. For example of this see: Grant, S., Langan-Fox, J., & Anglim, J. (2009). Big Five Traits as predictors of subjective and psychological well-being. Psychological Reports, 105, 201-231.
• I have some materials on Structural Equation Modelling available online. There are of course many good books and online resources.

A little history: I was originally taught to do Structural Equation modelling in Amos (which was bought out by SPSS, which was bought out by IBM). Among other things Amos attempted to bring SEM to the masses. The main mode of creating models in Amos is to draw them graphically. This makes it fairly easy to draw simple confirmatory factor analysis models and simple structural models. There are also many drawing tools designed to make it more efficient to draw diagrams and so on. However, the switch from simple drawing of models to testing models programmatically is a big jump, especially considering that you have to learn a programming language for something you might only use occasionally. And even drawing a single model can eventually become quite time consuming and error prone. For this and several other reasons I have been excited about the idea of running structural equation models in R. A selection of reasons why R would be a natural fit for structural equation modelling include the following:

Model comparison: Bad SEM style involves a researcher saying this is my model and testing only that model and ticking the Hu and Bentller fit statistics boxes. Good SEM style typically involves adopting a model comparison approach. A series of models are specified: e.g., baseline simple model, an hypothesised model, a series of plausible alternative models, and one or models models based on post-hoc theoretically justifiable refinements. R is well-suited to such a model comparison approach. Each model can be stored in a list. Fit statistics can be extracted using code. Tables for comparing models in terms of fit and nested chi-squares can easily be obtained.

Specification of models in R: The challenge is to provide a way of specifying models that is easy and efficient. It should then be easy to additionally adjust models by for example specifying equality constraints, constraint relationships to zero and so on.

Extracting model information in R: SEM produces a lot of output. This is well suited to R where this information can be stored in a list structure. This information can then be selectively extracted as needed.

Writing code for SEM and R: SEM tends to be a niche statistical task. I might use it 3 or 4 times per year. Thus, learning a whole new scripting environment is annoying. Using the same programming language as R makes a lot of sense. Scripts can more easily be shared to highlight common analyses, and those with more knowledge of SEM can lead the way in how to program more advanced models.

Graphically representing models in R: R is great for graphics. It would be great to be able to specify an SEM model and simply run a plot function to graphically represent it with options for what information is represented and how it is presented.

Implementation of various preparatory processes in R: R should make it easier to do various common preparatory activities, such as item parcelling, calculating alternatively estimates of correlations (e.g., polychoric correlations, etc.). The beauty of this is that the analysts could quickly examine the effect of tweaking various initial conditions on the final results.

Incremental improvement: SEM practice is constantly evolving. R programs typically adopt a modular orientation that allow for incorporation of additional procedures. E.g. new fit measures, new estimation algorithms, and so on.

Status of SEM in R

The sem package: John Fox wrote the sem package. It’s an excellent package. It provides a means for running structural equation models in R. There’s less handholding than with Amos. And specifying models efficiently takes some getting used to. It also does not have all the fit statistics and features of some of the bigger commercial packages. There’s further discussion on a psychology wiki. I list some additional links here. In short, the sem package is awesome for what it can do. However, it wont yet replace the bigger commercial packages.

OpenMx as an R package: For the above reasons it is particularly exciting to watch the development of OpenMx for release as an R package. An open-beta is scheduled for release October 2009. I’ve never used Mx, but through the grapevine, I’ve heard that it is very feature rich. There appears to be considerable programming and development effort going into producing a powerful SEM package for R.
In short I hope OpenMx lives up my expectations and that lots of SEM users start to share their OpenMx R code to provide examples of how it all works. Good SEM software may also provide a good enough justification for some student and academic researchers in psychology to take the time to learn the fundamentals of R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...