Structured simulation of regression models – simReg package.

[This article was first published on Educate-R - R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’d like to introduce a package that simulates regression models. This includes both single level and multilevel (i.e. hierarchical or linear mixed) models up to two levels of nesting. The package produces a unified framework to simulate all types of continuous regression models. In the future, I’d like to add the ability to simulate generalized linear models. This package is an extension of the functions I used to simulate data for my dissertation.

The package is currently on github https://github.com/lebebr01/simReg. Therefore, you can currently install the package by using the devtools package like so:

library(devtools)
install_github("lebebr01/simReg")

The primary function of interest in this package is sim.reg. To show the use of this function, here is a simple example simulating a single level regression mode. Note, this example is pulled directly from the Intro vignette.

library(simReg)
set.seed(100)
fixed <- ~ 1 + act + diff + numCourse + act:numCourse
fixed.param <- c(2, 4, 1, 3.5, 2)
cov.param <- list(mean = c(0, 0, 0), sd = c(4, 3, 3), var.type = c("single", "single", "single"))
n <- 150
errorVar <- 3
err.dist <- "norm"
temp.single <- sim.reg(fixed = fixed, fixed.param = fixed.param, cov.param = cov.param,
n = n, errorVar = errorVar, err.dist = err.dist, data.str = "single")
head(temp.single)

##   X.Intercept.     act     diff numCourse act.numCourse    Fbeta      err
## 1            1 -2.0088  3.10406    -5.566       11.1815 -0.05022 -3.35921
## 2            1  0.5261  4.96051    -3.056       -1.6077 -4.84527 -5.75176
## 3            1 -0.3157 -0.05384    -3.135        0.9897 -8.31073  1.63173
## 4            1  3.5471 -0.07261    -1.954       -6.9306 -4.58382  0.06435
## 5            1  0.4679  0.75074     1.148        0.5372  9.71476 -0.44783
## 6            1  1.2745 -1.01137     3.096        3.9455 24.81272  0.59651
##   sim.data ID
## 1   -3.409  1
## 2  -10.597  2
## 3   -6.679  3
## 4   -4.519  4
## 5    9.267  5
## 6   25.409  6

As you can see from the above code, the package uses a single sided equation to represent the fixed effects. Other arguments include the values for those fixed effects (fixed.param), the scale for the covariates (cov.param), the sample size (n), the error variance (errorVar), and the error distribution (err.dist). These are all put into the function sim.reg with the additional argument data.str to tell the function that we indeed want a single level regression and you get the following output. The data frame that is returned gives the values for the design matrix, the fixed portion of the model (Fbeta), and the random error term (err). The value of most interest if conducting a simulation would be the actually simulated value (sim.data).

Nested Example

A slightly more complicated example is shown below where longitudinal data are simulated.

fixed <- ~1 + time + diff + act + time:act
random <- ~1 + time + diff
fixed.param <- c(4, 2, 6, 2.3, 7)
random.param <- c(7, 4, 2)
cov.param <- list(mean = c(0, 0), sd = c(1.5, 4), var.type = c("lvl1", "lvl2"))
n <- 150
p <- 30
errorVar <- 4
randCor <- 0
rand.dist <- "norm"
err.dist <- "norm"
serCor <- "ID"
serCorVal <- NULL
data.str <- "long"
temp.long <- sim.reg(fixed, random, fixed.param, random.param, cov.param,
n, p, errorVar, randCor, rand.dist, err.dist, serCor, serCorVal, data.str)
head(temp.long)

##   X.Intercept. time     diff    act time.act    b0      b1    b2   Fbeta
## 1            1    0 -0.05455  1.608    0.000 5.118 -0.1118 0.251   7.371
## 2            1    1 -2.23677 -1.349   -1.349 5.118 -0.1118 0.251 -19.968
## 3            1    2  0.50321 -6.028  -12.056 5.118 -0.1118 0.251 -87.237
## 4            1    3  1.25027  8.436   25.308 5.118 -0.1118 0.251 214.063
## 5            1    4  2.05871  3.917   15.667 5.118 -0.1118 0.251 143.031
## 6            1    5 -1.87968  7.598   37.990 5.118 -0.1118 0.251 286.125
##   randEff     err sim.data withinID clustID
## 1   5.104  3.2637    15.74        1       1
## 2   4.444  0.9355   -14.59        2       1
## 3   5.020 -3.1693   -85.39        3       1
## 4   5.096  4.1523   223.31        4       1
## 5   5.187 -3.1716   145.05        5       1
## 6   4.086 -0.2605   289.95        6       1

Most of the arguments should look familiar to above, but a few are new. Most notably these are a one sided equation for the random effects (random), their variances (random.param), the number of observations within a cluster (p), the correlation among the random effects (randCor), the simulated distribution of the random effects (rand.dist), the serial correlation model for within cluster residuals (serCor), the values for the serial correlation models (serCorVal). Note now since this represents longitudinal data, the data.str argument is now specified as 'long'.

Other features

The package also simulates cross sectional multilevel models, covariates that are either a factor, ordinal, or categorical, and the basics for power simulation are there.

For further information, see the vignette by doing the following after installing the package:

vignette("Intro", package = "simReg")

Bugs, comments, or feature requests can be submitted on the github site https://github.com/lebebr01/simReg/issues.

To leave a comment for the author, please follow the link and comment on their blog: Educate-R - R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)