Structured simulation of regression models – simReg package.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I’d like to introduce a package that simulates regression models. This includes both single level and multilevel (i.e. hierarchical or linear mixed) models up to two levels of nesting. The package produces a unified framework to simulate all types of continuous regression models. In the future, I’d like to add the ability to simulate generalized linear models. This package is an extension of the functions I used to simulate data for my dissertation.
The package is currently on github https://github.com/lebebr01/simReg. Therefore, you can currently install the package by using the devtools package like so:
library(devtools) install_github("lebebr01/simReg")
The primary function of interest in this package is sim.reg. To show the use of this function, here is a simple example simulating a single level regression mode. Note, this example is pulled directly from the Intro vignette.
library(simReg) set.seed(100) fixed <- ~ 1 + act + diff + numCourse + act:numCourse fixed.param <- c(2, 4, 1, 3.5, 2) cov.param <- list(mean = c(0, 0, 0), sd = c(4, 3, 3), var.type = c("single", "single", "single")) n <- 150 errorVar <- 3 err.dist <- "norm" temp.single <- sim.reg(fixed = fixed, fixed.param = fixed.param, cov.param = cov.param, n = n, errorVar = errorVar, err.dist = err.dist, data.str = "single") head(temp.single) ## X.Intercept. act diff numCourse act.numCourse Fbeta err ## 1 1 -2.0088 3.10406 -5.566 11.1815 -0.05022 -3.35921 ## 2 1 0.5261 4.96051 -3.056 -1.6077 -4.84527 -5.75176 ## 3 1 -0.3157 -0.05384 -3.135 0.9897 -8.31073 1.63173 ## 4 1 3.5471 -0.07261 -1.954 -6.9306 -4.58382 0.06435 ## 5 1 0.4679 0.75074 1.148 0.5372 9.71476 -0.44783 ## 6 1 1.2745 -1.01137 3.096 3.9455 24.81272 0.59651 ## sim.data ID ## 1 -3.409 1 ## 2 -10.597 2 ## 3 -6.679 3 ## 4 -4.519 4 ## 5 9.267 5 ## 6 25.409 6
As you can see from the above code, the package uses a single sided equation to represent the fixed effects. Other arguments include the values for those fixed effects (fixed.param), the scale for the covariates (cov.param), the sample size (n), the error variance (errorVar), and the error distribution (err.dist). These are all put into the function sim.reg with the additional argument data.str to tell the function that we indeed want a single level regression and you get the following output. The data frame that is returned gives the values for the design matrix, the fixed portion of the model (Fbeta), and the random error term (err). The value of most interest if conducting a simulation would be the actually simulated value (sim.data).
Nested Example
A slightly more complicated example is shown below where longitudinal data are simulated.
fixed <- ~1 + time + diff + act + time:act random <- ~1 + time + diff fixed.param <- c(4, 2, 6, 2.3, 7) random.param <- c(7, 4, 2) cov.param <- list(mean = c(0, 0), sd = c(1.5, 4), var.type = c("lvl1", "lvl2")) n <- 150 p <- 30 errorVar <- 4 randCor <- 0 rand.dist <- "norm" err.dist <- "norm" serCor <- "ID" serCorVal <- NULL data.str <- "long" temp.long <- sim.reg(fixed, random, fixed.param, random.param, cov.param, n, p, errorVar, randCor, rand.dist, err.dist, serCor, serCorVal, data.str) head(temp.long) ## X.Intercept. time diff act time.act b0 b1 b2 Fbeta ## 1 1 0 -0.05455 1.608 0.000 5.118 -0.1118 0.251 7.371 ## 2 1 1 -2.23677 -1.349 -1.349 5.118 -0.1118 0.251 -19.968 ## 3 1 2 0.50321 -6.028 -12.056 5.118 -0.1118 0.251 -87.237 ## 4 1 3 1.25027 8.436 25.308 5.118 -0.1118 0.251 214.063 ## 5 1 4 2.05871 3.917 15.667 5.118 -0.1118 0.251 143.031 ## 6 1 5 -1.87968 7.598 37.990 5.118 -0.1118 0.251 286.125 ## randEff err sim.data withinID clustID ## 1 5.104 3.2637 15.74 1 1 ## 2 4.444 0.9355 -14.59 2 1 ## 3 5.020 -3.1693 -85.39 3 1 ## 4 5.096 4.1523 223.31 4 1 ## 5 5.187 -3.1716 145.05 5 1 ## 6 4.086 -0.2605 289.95 6 1
Most of the arguments should look familiar to above, but a few are new. Most notably these are a one sided equation for the random effects (random), their variances (random.param), the number of observations within a cluster (p), the correlation among the random effects (randCor), the simulated distribution of the random effects (rand.dist), the serial correlation model for within cluster residuals (serCor), the values for the serial correlation models (serCorVal). Note now since this represents longitudinal data, the data.str argument is now specified as 'long'.
Other features
The package also simulates cross sectional multilevel models, covariates that are either a factor, ordinal, or categorical, and the basics for power simulation are there.
For further information, see the vignette by doing the following after installing the package:
vignette("Intro", package = "simReg")
Bugs, comments, or feature requests can be submitted on the github site https://github.com/lebebr01/simReg/issues.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.