[This article was first published on Data, Evidence, and Policy - Jared Knowles, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Getting Started with Structural Equation Modeling Part 1

# Introduction

For the analyst familiar with linear regression fitting structural equation models can at first feel strange. In the R environment, fitting structural equation models involves learning new modeling syntax, new plotting syntax, and often a new data input method. However, a quick reorientation and soon the user is exposed to the differences, fitting structural equation models can be a powerful tool in the analyst's toolkit.

This tutorial will cover getting set up and running a few basic models using `lavaan` in R.1 Future tutorials will cover:

• constructing latent variables
• comparing alternate models
• multi-group analysis on larger datasets.

Getting started using structural equation modeling (SEM) in R can be daunting. There are lots of different packages for implementing SEM in R and there are different features of SEM that a user might be interested in implementing. A few packages you might come across can be found on the CRAN Psychometrics Task View.

For those who want to just dive in the `lavaan` package seems to offer the most comprehensive feature set for most SEM users and has a well thought out and easy to learn syntax for describing SEM models. To install `lavaan`, we just run:

```# Main version
install.packages("lavaan")

# Or to install the dev version
library(devtools)
install_github("lavaan", "yrosseel")
```

Once we load up the lavaan package, we need to read in the dataset. `lavaan` accepts two different types of data, either a standard R dataframe, or a variance-covariance matrix. Since the latter is unfamiliar to us coming from the standard `lm` linear modeling framework in R, we'll start with reading in the simplest variance-covariance matrix possible and running a path analysis model.

```library(lavaan)
mat1 <- matrix(c(1, 0, 0, 0.6, 1, 0, 0.33, 0.63, 1), 3, 3, byrow = TRUE)

colnames(mat1) <- rownames(mat1) <- c("ILL", "IMM", "DEP")

myN <- 500
print(mat1)

##      ILL  IMM DEP
## ILL 1.00 0.00   0
## IMM 0.60 1.00   0
## DEP 0.33 0.63   1

# Note that we only input the lower triangle of the matrix. This is
# sufficient though we could put the whole matrix in if we like
```

Now we have a variance-covariance matrix in our environment named `mat1` and a variable `myN` corresponding to the number of observations in our dataset. Alternatively, we could provide R with the full dataset and it can derive `mat1` and `myN` itself.

With this data we can construct two possible models:

1. Depression (DEP) influences Immune System (IMM) influences Illness (ILL)
2. IMM influences ILL influences DEP

Using SEM we can evaluate which model best explains the covariances we observe in our data above. Fitting models in `lavaan` is a two step process. First, we create a text string that serves as the `lavaan` model and follows the `lavaan` model syntax. Next, we give `lavaan` the instructions on how to fit this model to the data using either the `cfa`, `lavaan`, or `sem` functions. Here we will use the `sem` function. Other functions will be covered in a future post.

```# Specify the model

mod1 <- "ILL ~ IMM \n        IMM ~ DEP"

# Give lavaan the command to fit the model
mod1fit <- sem(mod1, sample.cov = mat1, sample.nobs = 500)

# Specify model 2

mod2 <- "DEP ~ ILL\n        ILL ~ IMM"

mod2fit <- sem(mod2, sample.cov = mat1, sample.nobs = 500)
```

Now we have two objects stored in our environment for each model. We have the model string and the modelfit object. The model fit objects (`mod1fit` and `mod2fit`) are `lavaan` class objects. These are S4 objects with many supported methods, including the `summary` method which provides a lot of useful output:

```# Summarize the model fit
summary(mod1fit)

## lavaan (0.5-14) converged normally after  12 iterations
##
##   Number of observations                           500
##
##   Estimator                                         ML
##   Minimum Function Test Statistic                2.994
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.084
##
## Parameter estimates:
##
##   Information                                 Expected
##   Standard Errors                             Standard
##
##                    Estimate  Std.err  Z-value  P(>|z|)
## Regressions:
##   ILL ~
##     IMM               0.600    0.036   16.771    0.000
##   IMM ~
##     DEP               0.630    0.035   18.140    0.000
##
## Variances:
##     ILL               0.639    0.040
##     IMM               0.602    0.038

summary(mod2fit)

## lavaan (0.5-14) converged normally after  11 iterations
##
##   Number of observations                           500
##
##   Estimator                                         ML
##   Minimum Function Test Statistic              198.180
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
##
## Parameter estimates:
##
##   Information                                 Expected
##   Standard Errors                             Standard
##
##                    Estimate  Std.err  Z-value  P(>|z|)
## Regressions:
##   DEP ~
##     ILL               0.330    0.042    7.817    0.000
##   ILL ~
##     IMM               0.600    0.036   16.771    0.000
##
## Variances:
##     DEP               0.889    0.056
##     ILL               0.639    0.040
```

One of the best ways to understand an SEM model is to inspect the model visually using a path diagram. Thanks to the `semPlot` package, this is easy to do in R.2 First, install `semPlot`:

```# Official version
install.packages("semPlot")

# Or to install the dev version
library(devtools)
install_github("semPlot", "SachaEpskamp")
```

Next we load the library and make some path diagrams.

```library(semPlot)
semPaths(mod1fit, what = "est", layout = "tree", title = TRUE, style = "LISREL")
```

```semPaths(mod2fit, what = "est", layout = "tree", title = TRUE, style = "LISREL")
```

These two simple path models look great. But which is better? We can run a simple chi-square test on the `lavaan` objects `mod1fit` and `mod2fit`.

```anova(mod1fit, mod2fit)

## Chi Square Difference Test
##
##         Df  AIC  BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## mod1fit  1 3786 3803   2.99
## mod2fit  1 3981 3998 198.18        195       0     <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

We can see that very clearly we prefer Model 2. Let's look at some properties of model 2 that we can access through the `lavaan` object with convenience functions.

```# Goodness of fit measures
fitMeasures(mod2fit)

##              fmin             chisq                df            pvalue
##             0.198           198.180             1.000             0.000
##    baseline.chisq       baseline.df   baseline.pvalue               cfi
##           478.973             3.000             0.000             0.586
##               tli              nnfi               rfi               nfi
##            -0.243            -0.243             1.000             0.586
##              pnfi               ifi               rni              logl
##             0.195             0.587             0.586         -1986.510
## unrestricted.logl              npar               aic               bic
##         -1887.420             4.000          3981.020          3997.878
##            ntotal              bic2             rmsea    rmsea.ci.lower
##           500.000          3985.182             0.628             0.556
##    rmsea.ci.upper      rmsea.pvalue               rmr        rmr_nomean
##             0.703             0.000             0.176             0.176
##              srmr       srmr_nomean             cn_05             cn_01
##             0.176             0.176            10.692            17.740
##               gfi              agfi              pgfi               mfi
##             0.821            -0.075             0.137             0.821
##              ecvi
##             0.412

# Estimates of the model parameters
parameterEstimates(mod2fit, ci = TRUE, boot.ci.type = "norm")

##   lhs op rhs   est    se      z pvalue ci.lower ci.upper
## 1 DEP  ~ ILL 0.330 0.042  7.817      0    0.247    0.413
## 2 ILL  ~ IMM 0.600 0.036 16.771      0    0.530    0.670
## 3 DEP ~~ DEP 0.889 0.056 15.811      0    0.779    1.000
## 4 ILL ~~ ILL 0.639 0.040 15.811      0    0.560    0.718
## 5 IMM ~~ IMM 0.998 0.000     NA     NA    0.998    0.998

# Modification indices
modindices(mod2fit, standardized = TRUE)

##    lhs op rhs    mi    epc sepc.lv sepc.all sepc.nox
## 1  DEP ~~ DEP   0.0  0.000   0.000    0.000    0.000
## 2  DEP ~~ ILL 163.6 -0.719  -0.719   -0.720   -0.720
## 3  DEP ~~ IMM 163.6  0.674   0.674    0.675    0.674
## 4  ILL ~~ ILL   0.0  0.000   0.000    0.000    0.000
## 5  ILL ~~ IMM    NA     NA      NA       NA       NA
## 6  IMM ~~ IMM   0.0  0.000   0.000    0.000    0.000
## 7  DEP  ~ ILL   0.0  0.000   0.000    0.000    0.000
## 8  DEP  ~ IMM 163.6  0.675   0.675    0.675    0.676
## 9  ILL  ~ DEP 163.6 -0.808  -0.808   -0.808   -0.808
## 10 ILL  ~ IMM   0.0  0.000   0.000    0.000    0.000
## 11 IMM  ~ DEP 143.8  0.666   0.666    0.666    0.666
## 12 IMM  ~ ILL   0.0  0.000   0.000    0.000    0.000
```

That's it. From inputing a variance-covariance matrix to fitting a model, drawing a path diagram, comparing to alternate models, and finally inspecting the parameters of the preferred model. `lavaan` is an amazing project which adds great capabilities to R. These will be explored in future posts.

# Appendix

```citation(package = "lavaan")

##
## To cite lavaan in publications use:
##
##   Yves Rosseel (2012). lavaan: An R Package for Structural
##   Equation Modeling. Journal of Statistical Software, 48(2), 1-36.
##   URL http://www.jstatsoft.org/v48/i02/.
##
## A BibTeX entry for LaTeX users is
##
##   @Article{,
##     title = {{lavaan}: An {R} Package for Structural Equation Modeling},
##     author = {Yves Rosseel},
##     journal = {Journal of Statistical Software},
##     year = {2012},
##     volume = {48},
##     number = {2},
##     pages = {1--36},
##     url = {http://www.jstatsoft.org/v48/i02/},
##   }

citation(package = "semPlot")

##
## To cite package 'semPlot' in publications use:
##
##   Sacha Epskamp (2013). semPlot: Path diagrams and visual analysis
##   of various SEM packages' output. R package version 0.3.3.
##   https://github.com/SachaEpskamp/semPlot
##
## A BibTeX entry for LaTeX users is
##
##   @Manual{,
##     title = {semPlot: Path diagrams and visual analysis of various SEM packages' output},
##     author = {Sacha Epskamp},
##     year = {2013},
##     note = {R package version 0.3.3},
##     url = {https://github.com/SachaEpskamp/semPlot},
##   }
##
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.

sessionInfo()

## R version 3.0.1 (2013-05-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] semPlot_0.3.3  lavaan_0.5-14  quadprog_1.5-5 pbivnorm_0.5-1
## [5] mnormt_1.4-5   boot_1.3-9     MASS_7.3-28    knitr_1.4.1
##
## loaded via a namespace (and not attached):
##  [1] car_2.0-18       cluster_1.14.4   colorspace_1.2-2 corpcor_1.6.6
##  [5] digest_0.6.3     ellipse_0.3-8    evaluate_0.4.7   formatR_0.9
##  [9] grid_3.0.1       Hmisc_3.12-2     igraph_0.6.5-2   jpeg_0.1-6
## [13] lattice_0.20-23  lisrelToR_0.1.4  plyr_1.8         png_0.1-6
## [17] psych_1.3.2      qgraph_1.2.3     rockchalk_1.8.0  rpart_4.1-2
## [21] sem_3.1-3        stats4_3.0.1     stringr_0.6.2    tools_3.0.1
## [25] XML_3.98-1.1
```