# Latent Variable Analysis with R: Getting Setup with lavaan

September 1, 2013
By

(This article was first published on Data, Evidence, and Policy - Jared Knowles, and kindly contributed to R-bloggers)

Getting Started with Structural Equation Modeling Part 1

# Introduction

For the analyst familiar with linear regression fitting structural equation models
can at first feel strange. In the R environment, fitting structural equation models
involves learning new modeling syntax, new plotting syntax, and often a new
data input method. However, a quick reorientation and soon the user is exposed
to the differences, fitting structural equation models can be a powerful tool in
the analyst's toolkit.

This tutorial will cover getting set up and running a few basic models using `lavaan`
in R.1 Future tutorials will cover:

• constructing latent variables
• comparing alternate models
• multi-group analysis on larger datasets.

Getting started using structural equation modeling (SEM) in R can be daunting. There are
lots of different packages for implementing SEM in R and there are different features
of SEM that a user might be interested in implementing. A few packages you might come
across can be found on the CRAN Psychometrics Task View.

For those who want to just dive in the `lavaan` package seems to offer the most
comprehensive feature set for most SEM users and has a well thought out and easy
to learn syntax for describing SEM models. To install `lavaan`, we just run:

``````# Main version
install.packages("lavaan")

# Or to install the dev version
library(devtools)
install_github("lavaan", "yrosseel")
``````

Once we load up the lavaan package, we need to read in the dataset. `lavaan` accepts
two different types of data, either a standard R dataframe, or a variance-covariance
matrix. Since the latter is unfamiliar to us coming from the standard `lm` linear modeling
possible and running a path analysis model.

``````library(lavaan)
mat1 <- matrix(c(1, 0, 0, 0.6, 1, 0, 0.33, 0.63, 1), 3, 3, byrow = TRUE)

colnames(mat1) <- rownames(mat1) <- c("ILL", "IMM", "DEP")

myN <- 500
print(mat1)
``````
``````##      ILL  IMM DEP
## ILL 1.00 0.00   0
## IMM 0.60 1.00   0
## DEP 0.33 0.63   1
``````
``````# Note that we only input the lower triangle of the matrix. This is
# sufficient though we could put the whole matrix in if we like
``````

Now we have a variance-covariance matrix in our environment named `mat1` and a
variable `myN` corresponding to the number of observations in our dataset. Alternatively,
we could provide R with the full dataset and it can derive `mat1` and `myN` itself.

With this data we can construct two possible models:

1. Depression (DEP) influences Immune System (IMM) influences Illness (ILL)
2. IMM influences ILL influences DEP

Using SEM we can evaluate which model best explains the covariances we observe in
our data above. Fitting models in `lavaan` is a two step process. First, we create
a text string that serves as the `lavaan` model and follows the `lavaan` model
syntax
. Next, we
give `lavaan` the instructions on how to fit this model to the data using either the
`cfa`, `lavaan`, or `sem` functions. Here we will use the `sem` function. Other
functions will be covered in a future post.

``````# Specify the model

mod1 <- "ILL ~ IMM \n        IMM ~ DEP"

# Give lavaan the command to fit the model
mod1fit <- sem(mod1, sample.cov = mat1, sample.nobs = 500)

# Specify model 2

mod2 <- "DEP ~ ILL\n        ILL ~ IMM"

mod2fit <- sem(mod2, sample.cov = mat1, sample.nobs = 500)
``````

Now we have two objects stored in our environment for each model. We have the
model string and the modelfit object. The model fit objects (`mod1fit` and `mod2fit`)
are `lavaan` class objects. These are S4 objects with many supported methods, including
the `summary` method which provides a lot of useful output:

``````# Summarize the model fit
summary(mod1fit)
``````
``````## lavaan (0.5-14) converged normally after  12 iterations
##
##   Number of observations                           500
##
##   Estimator                                         ML
##   Minimum Function Test Statistic                2.994
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.084
##
## Parameter estimates:
##
##   Information                                 Expected
##   Standard Errors                             Standard
##
##                    Estimate  Std.err  Z-value  P(>|z|)
## Regressions:
##   ILL ~
##     IMM               0.600    0.036   16.771    0.000
##   IMM ~
##     DEP               0.630    0.035   18.140    0.000
##
## Variances:
##     ILL               0.639    0.040
##     IMM               0.602    0.038
``````
``````
summary(mod2fit)
``````
``````## lavaan (0.5-14) converged normally after  11 iterations
##
##   Number of observations                           500
##
##   Estimator                                         ML
##   Minimum Function Test Statistic              198.180
##   Degrees of freedom                                 1
##   P-value (Chi-square)                           0.000
##
## Parameter estimates:
##
##   Information                                 Expected
##   Standard Errors                             Standard
##
##                    Estimate  Std.err  Z-value  P(>|z|)
## Regressions:
##   DEP ~
##     ILL               0.330    0.042    7.817    0.000
##   ILL ~
##     IMM               0.600    0.036   16.771    0.000
##
## Variances:
##     DEP               0.889    0.056
##     ILL               0.639    0.040
``````

One of the best ways to understand an SEM model is to inspect the model visually
using a path diagram. Thanks to the `semPlot` package,
this is easy to do in R.2 First, install `semPlot`:

``````# Official version
install.packages("semPlot")

# Or to install the dev version
library(devtools)
install_github("semPlot", "SachaEpskamp")
``````

Next we load the library and make some path diagrams.

``````library(semPlot)
semPaths(mod1fit, what = "est", layout = "tree", title = TRUE, style = "LISREL")
``````

``````semPaths(mod2fit, what = "est", layout = "tree", title = TRUE, style = "LISREL")
``````

These two simple path models look great. But which is better? We can run a simple
chi-square test on the `lavaan` objects `mod1fit` and `mod2fit`.

``````anova(mod1fit, mod2fit)
``````
``````## Chi Square Difference Test
##
##         Df  AIC  BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## mod1fit  1 3786 3803   2.99
## mod2fit  1 3981 3998 198.18        195       0     <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
``````

We can see that very clearly we prefer Model 2. Let's look at some properties of
model 2 that we can access through the `lavaan` object with convenience functions.

``````# Goodness of fit measures
fitMeasures(mod2fit)
``````
``````##              fmin             chisq                df            pvalue
##             0.198           198.180             1.000             0.000
##    baseline.chisq       baseline.df   baseline.pvalue               cfi
##           478.973             3.000             0.000             0.586
##               tli              nnfi               rfi               nfi
##            -0.243            -0.243             1.000             0.586
##              pnfi               ifi               rni              logl
##             0.195             0.587             0.586         -1986.510
## unrestricted.logl              npar               aic               bic
##         -1887.420             4.000          3981.020          3997.878
##            ntotal              bic2             rmsea    rmsea.ci.lower
##           500.000          3985.182             0.628             0.556
##    rmsea.ci.upper      rmsea.pvalue               rmr        rmr_nomean
##             0.703             0.000             0.176             0.176
##              srmr       srmr_nomean             cn_05             cn_01
##             0.176             0.176            10.692            17.740
##               gfi              agfi              pgfi               mfi
##             0.821            -0.075             0.137             0.821
##              ecvi
##             0.412
``````
``````
# Estimates of the model parameters
parameterEstimates(mod2fit, ci = TRUE, boot.ci.type = "norm")
``````
``````##   lhs op rhs   est    se      z pvalue ci.lower ci.upper
## 1 DEP  ~ ILL 0.330 0.042  7.817      0    0.247    0.413
## 2 ILL  ~ IMM 0.600 0.036 16.771      0    0.530    0.670
## 3 DEP ~~ DEP 0.889 0.056 15.811      0    0.779    1.000
## 4 ILL ~~ ILL 0.639 0.040 15.811      0    0.560    0.718
## 5 IMM ~~ IMM 0.998 0.000     NA     NA    0.998    0.998
``````
``````
# Modification indices
modindices(mod2fit, standardized = TRUE)
``````
``````##    lhs op rhs    mi    epc sepc.lv sepc.all sepc.nox
## 1  DEP ~~ DEP   0.0  0.000   0.000    0.000    0.000
## 2  DEP ~~ ILL 163.6 -0.719  -0.719   -0.720   -0.720
## 3  DEP ~~ IMM 163.6  0.674   0.674    0.675    0.674
## 4  ILL ~~ ILL   0.0  0.000   0.000    0.000    0.000
## 5  ILL ~~ IMM    NA     NA      NA       NA       NA
## 6  IMM ~~ IMM   0.0  0.000   0.000    0.000    0.000
## 7  DEP  ~ ILL   0.0  0.000   0.000    0.000    0.000
## 8  DEP  ~ IMM 163.6  0.675   0.675    0.675    0.676
## 9  ILL  ~ DEP 163.6 -0.808  -0.808   -0.808   -0.808
## 10 ILL  ~ IMM   0.0  0.000   0.000    0.000    0.000
## 11 IMM  ~ DEP 143.8  0.666   0.666    0.666    0.666
## 12 IMM  ~ ILL   0.0  0.000   0.000    0.000    0.000
``````

That's it. From inputing a variance-covariance matrix to fitting a model,
drawing a path diagram, comparing to alternate models, and finally inspecting
the parameters of the preferred model. `lavaan` is an amazing project which
adds great capabilities to R. These will be explored in future posts.

# Appendix

``````citation(package = "lavaan")
``````
``````##
## To cite lavaan in publications use:
##
##   Yves Rosseel (2012). lavaan: An R Package for Structural
##   Equation Modeling. Journal of Statistical Software, 48(2), 1-36.
##   URL http://www.jstatsoft.org/v48/i02/.
##
## A BibTeX entry for LaTeX users is
##
##   @Article{,
##     title = {{lavaan}: An {R} Package for Structural Equation Modeling},
##     author = {Yves Rosseel},
##     journal = {Journal of Statistical Software},
##     year = {2012},
##     volume = {48},
##     number = {2},
##     pages = {1--36},
##     url = {http://www.jstatsoft.org/v48/i02/},
##   }
``````
``````citation(package = "semPlot")
``````
``````##
## To cite package 'semPlot' in publications use:
##
##   Sacha Epskamp (2013). semPlot: Path diagrams and visual analysis
##   of various SEM packages' output. R package version 0.3.3.
##   https://github.com/SachaEpskamp/semPlot
##
## A BibTeX entry for LaTeX users is
##
##   @Manual{,
##     title = {semPlot: Path diagrams and visual analysis of various SEM packages' output},
##     author = {Sacha Epskamp},
##     year = {2013},
##     note = {R package version 0.3.3},
##     url = {https://github.com/SachaEpskamp/semPlot},
##   }
##
## ATTENTION: This citation information has been auto-generated from
## the package DESCRIPTION file and may need manual editing, see
## 'help("citation")'.
``````
``````sessionInfo()
``````
``````## R version 3.0.1 (2013-05-16)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
##
## locale:
## [1] LC_COLLATE=English_United States.1252
## [2] LC_CTYPE=English_United States.1252
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United States.1252
##
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base
##
## other attached packages:
## [1] semPlot_0.3.3  lavaan_0.5-14  quadprog_1.5-5 pbivnorm_0.5-1
## [5] mnormt_1.4-5   boot_1.3-9     MASS_7.3-28    knitr_1.4.1
##
## loaded via a namespace (and not attached):
##  [1] car_2.0-18       cluster_1.14.4   colorspace_1.2-2 corpcor_1.6.6
##  [5] digest_0.6.3     ellipse_0.3-8    evaluate_0.4.7   formatR_0.9
##  [9] grid_3.0.1       Hmisc_3.12-2     igraph_0.6.5-2   jpeg_0.1-6
## [13] lattice_0.20-23  lisrelToR_0.1.4  plyr_1.8         png_0.1-6
## [17] psych_1.3.2      qgraph_1.2.3     rockchalk_1.8.0  rpart_4.1-2
## [21] sem_3.1-3        stats4_3.0.1     stringr_0.6.2    tools_3.0.1
## [25] XML_3.98-1.1
``````

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...