# Latent Variable Analysis with R: Getting Setup with lavaan

**Data, Evidence, and Policy - Jared Knowles**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Getting Started with Structural Equation Modeling: Part 1

# Introduction

For the analyst familiar with linear regression fitting structural equation models can at first feel strange. In the R environment, fitting structural equation models involves learning new modeling syntax, new plotting syntax, and often a new data input method. However, a quick reorientation and soon the user is exposed to the differences, fitting structural equation models can be a powerful tool in the analyst's toolkit.

This tutorial will cover getting set up and running a few basic models using `lavaan`

in R.1 Future tutorials will cover:

- constructing latent variables
- comparing alternate models
- multi-group analysis on larger datasets.

# Setting up your enviRonment

Getting started using structural equation modeling (SEM) in R can be daunting. There are lots of different packages for implementing SEM in R and there are different features of SEM that a user might be interested in implementing. A few packages you might come across can be found on the CRAN Psychometrics Task View.

For those who want to just dive in the `lavaan`

package seems to offer the most
comprehensive feature set for most SEM users and has a well thought out and easy
to learn syntax for describing SEM models. To install `lavaan`

, we just run:

# Main version install.packages("lavaan") # Or to install the dev version library(devtools) install_github("lavaan", "yrosseel")

# Read in the data

Once we load up the lavaan package, we need to read in the dataset. `lavaan`

accepts
two different types of data, either a standard R dataframe, or a variance-covariance
matrix. Since the latter is unfamiliar to us coming from the standard `lm`

linear modeling
framework in R, we'll start with reading in the simplest variance-covariance matrix
possible and running a path analysis model.

library(lavaan) mat1 <- matrix(c(1, 0, 0, 0.6, 1, 0, 0.33, 0.63, 1), 3, 3, byrow = TRUE) colnames(mat1) <- rownames(mat1) <- c("ILL", "IMM", "DEP") myN <- 500 print(mat1) ## ILL IMM DEP ## ILL 1.00 0.00 0 ## IMM 0.60 1.00 0 ## DEP 0.33 0.63 1 # Note that we only input the lower triangle of the matrix. This is # sufficient though we could put the whole matrix in if we like

Now we have a variance-covariance matrix in our environment named `mat1`

and a
variable `myN`

corresponding to the number of observations in our dataset. Alternatively,
we could provide R with the full dataset and it can derive `mat1`

and `myN`

itself.

With this data we can construct two possible models:

- Depression (DEP) influences Immune System (IMM) influences Illness (ILL)
- IMM influences ILL influences DEP

Using SEM we can evaluate which model best explains the covariances we observe in
our data above. Fitting models in `lavaan`

is a two step process. First, we create
a text string that serves as the `lavaan`

model and follows the `lavaan`

model
syntax. Next, we
give `lavaan`

the instructions on how to fit this model to the data using either the
`cfa`

, `lavaan`

, or `sem`

functions. Here we will use the `sem`

function. Other
functions will be covered in a future post.

# Specify the model mod1 <- "ILL ~ IMM \n IMM ~ DEP" # Give lavaan the command to fit the model mod1fit <- sem(mod1, sample.cov = mat1, sample.nobs = 500) # Specify model 2 mod2 <- "DEP ~ ILL\n ILL ~ IMM" mod2fit <- sem(mod2, sample.cov = mat1, sample.nobs = 500)

Now we have two objects stored in our environment for each model. We have the
model string and the modelfit object. The model fit objects (`mod1fit`

and `mod2fit`

)
are `lavaan`

class objects. These are S4 objects with many supported methods, including
the `summary`

method which provides a lot of useful output:

# Summarize the model fit summary(mod1fit) ## lavaan (0.5-14) converged normally after 12 iterations ## ## Number of observations 500 ## ## Estimator ML ## Minimum Function Test Statistic 2.994 ## Degrees of freedom 1 ## P-value (Chi-square) 0.084 ## ## Parameter estimates: ## ## Information Expected ## Standard Errors Standard ## ## Estimate Std.err Z-value P(>|z|) ## Regressions: ## ILL ~ ## IMM 0.600 0.036 16.771 0.000 ## IMM ~ ## DEP 0.630 0.035 18.140 0.000 ## ## Variances: ## ILL 0.639 0.040 ## IMM 0.602 0.038 summary(mod2fit) ## lavaan (0.5-14) converged normally after 11 iterations ## ## Number of observations 500 ## ## Estimator ML ## Minimum Function Test Statistic 198.180 ## Degrees of freedom 1 ## P-value (Chi-square) 0.000 ## ## Parameter estimates: ## ## Information Expected ## Standard Errors Standard ## ## Estimate Std.err Z-value P(>|z|) ## Regressions: ## DEP ~ ## ILL 0.330 0.042 7.817 0.000 ## ILL ~ ## IMM 0.600 0.036 16.771 0.000 ## ## Variances: ## DEP 0.889 0.056 ## ILL 0.639 0.040

One of the best ways to understand an SEM model is to inspect the model visually
using a path diagram. Thanks to the `semPlot`

package,
this is easy to do in R.2 First, install `semPlot`

:

# Official version install.packages("semPlot") # Or to install the dev version library(devtools) install_github("semPlot", "SachaEpskamp")

Next we load the library and make some path diagrams.

library(semPlot) semPaths(mod1fit, what = "est", layout = "tree", title = TRUE, style = "LISREL")

semPaths(mod2fit, what = "est", layout = "tree", title = TRUE, style = "LISREL")

These two simple path models look great. But which is better? We can run a simple
chi-square test on the `lavaan`

objects `mod1fit`

and `mod2fit`

.

anova(mod1fit, mod2fit) ## Chi Square Difference Test ## ## Df AIC BIC Chisq Chisq diff Df diff Pr(>Chisq) ## mod1fit 1 3786 3803 2.99 ## mod2fit 1 3981 3998 198.18 195 0 <2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We can see that very clearly we prefer Model 2. Let's look at some properties of
model 2 that we can access through the `lavaan`

object with convenience functions.

# Goodness of fit measures fitMeasures(mod2fit) ## fmin chisq df pvalue ## 0.198 198.180 1.000 0.000 ## baseline.chisq baseline.df baseline.pvalue cfi ## 478.973 3.000 0.000 0.586 ## tli nnfi rfi nfi ## -0.243 -0.243 1.000 0.586 ## pnfi ifi rni logl ## 0.195 0.587 0.586 -1986.510 ## unrestricted.logl npar aic bic ## -1887.420 4.000 3981.020 3997.878 ## ntotal bic2 rmsea rmsea.ci.lower ## 500.000 3985.182 0.628 0.556 ## rmsea.ci.upper rmsea.pvalue rmr rmr_nomean ## 0.703 0.000 0.176 0.176 ## srmr srmr_nomean cn_05 cn_01 ## 0.176 0.176 10.692 17.740 ## gfi agfi pgfi mfi ## 0.821 -0.075 0.137 0.821 ## ecvi ## 0.412 # Estimates of the model parameters parameterEstimates(mod2fit, ci = TRUE, boot.ci.type = "norm") ## lhs op rhs est se z pvalue ci.lower ci.upper ## 1 DEP ~ ILL 0.330 0.042 7.817 0 0.247 0.413 ## 2 ILL ~ IMM 0.600 0.036 16.771 0 0.530 0.670 ## 3 DEP ~~ DEP 0.889 0.056 15.811 0 0.779 1.000 ## 4 ILL ~~ ILL 0.639 0.040 15.811 0 0.560 0.718 ## 5 IMM ~~ IMM 0.998 0.000 NA NA 0.998 0.998 # Modification indices modindices(mod2fit, standardized = TRUE) ## lhs op rhs mi epc sepc.lv sepc.all sepc.nox ## 1 DEP ~~ DEP 0.0 0.000 0.000 0.000 0.000 ## 2 DEP ~~ ILL 163.6 -0.719 -0.719 -0.720 -0.720 ## 3 DEP ~~ IMM 163.6 0.674 0.674 0.675 0.674 ## 4 ILL ~~ ILL 0.0 0.000 0.000 0.000 0.000 ## 5 ILL ~~ IMM NA NA NA NA NA ## 6 IMM ~~ IMM 0.0 0.000 0.000 0.000 0.000 ## 7 DEP ~ ILL 0.0 0.000 0.000 0.000 0.000 ## 8 DEP ~ IMM 163.6 0.675 0.675 0.675 0.676 ## 9 ILL ~ DEP 163.6 -0.808 -0.808 -0.808 -0.808 ## 10 ILL ~ IMM 0.0 0.000 0.000 0.000 0.000 ## 11 IMM ~ DEP 143.8 0.666 0.666 0.666 0.666 ## 12 IMM ~ ILL 0.0 0.000 0.000 0.000 0.000

That's it. From inputing a variance-covariance matrix to fitting a model,
drawing a path diagram, comparing to alternate models, and finally inspecting
the parameters of the preferred model. `lavaan`

is an amazing project which
adds great capabilities to R. These will be explored in future posts.

# Appendix

citation(package = "lavaan") ## ## To cite lavaan in publications use: ## ## Yves Rosseel (2012). lavaan: An R Package for Structural ## Equation Modeling. Journal of Statistical Software, 48(2), 1-36. ## URL http://www.jstatsoft.org/v48/i02/. ## ## A BibTeX entry for LaTeX users is ## ## @Article{, ## title = {{lavaan}: An {R} Package for Structural Equation Modeling}, ## author = {Yves Rosseel}, ## journal = {Journal of Statistical Software}, ## year = {2012}, ## volume = {48}, ## number = {2}, ## pages = {1--36}, ## url = {http://www.jstatsoft.org/v48/i02/}, ## } citation(package = "semPlot") ## ## To cite package 'semPlot' in publications use: ## ## Sacha Epskamp (2013). semPlot: Path diagrams and visual analysis ## of various SEM packages' output. R package version 0.3.3. ## https://github.com/SachaEpskamp/semPlot ## ## A BibTeX entry for LaTeX users is ## ## @Manual{, ## title = {semPlot: Path diagrams and visual analysis of various SEM packages' output}, ## author = {Sacha Epskamp}, ## year = {2013}, ## note = {R package version 0.3.3}, ## url = {https://github.com/SachaEpskamp/semPlot}, ## } ## ## ATTENTION: This citation information has been auto-generated from ## the package DESCRIPTION file and may need manual editing, see ## 'help("citation")'. sessionInfo() ## R version 3.0.1 (2013-05-16) ## Platform: x86_64-w64-mingw32/x64 (64-bit) ## ## locale: ## [1] LC_COLLATE=English_United States.1252 ## [2] LC_CTYPE=English_United States.1252 ## [3] LC_MONETARY=English_United States.1252 ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.1252 ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] semPlot_0.3.3 lavaan_0.5-14 quadprog_1.5-5 pbivnorm_0.5-1 ## [5] mnormt_1.4-5 boot_1.3-9 MASS_7.3-28 knitr_1.4.1 ## ## loaded via a namespace (and not attached): ## [1] car_2.0-18 cluster_1.14.4 colorspace_1.2-2 corpcor_1.6.6 ## [5] digest_0.6.3 ellipse_0.3-8 evaluate_0.4.7 formatR_0.9 ## [9] grid_3.0.1 Hmisc_3.12-2 igraph_0.6.5-2 jpeg_0.1-6 ## [13] lattice_0.20-23 lisrelToR_0.1.4 plyr_1.8 png_0.1-6 ## [17] psych_1.3.2 qgraph_1.2.3 rockchalk_1.8.0 rpart_4.1-2 ## [21] sem_3.1-3 stats4_3.0.1 stringr_0.6.2 tools_3.0.1 ## [25] XML_3.98-1.1

**leave a comment**for the author, please follow the link and comment on their blog:

**Data, Evidence, and Policy - Jared Knowles**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.