A new simstudy function to make simulating replications easier

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Four years ago, I described a simple framework for organizing simulations to conduct power analyses or explore the operating characteristics of modeling approaches. In that framework, I introduced a small function scenario_list that generated a list of scenarios forming the basis for simulations. I had always intended to incorporate that function into simstudy, and now I have finally done so The new function is available as of version 0.9.0.

This post offers a brief introduction to the function and concludes with a small simulation.

Here are the R packages used in the code that follows:

library(simstudy)
library(data.table)
library(parallel)
library(lmerTest)
library(broom.mixed)

scenario_list takes a collection of vectors and generates all possible combinations of their elements. Under the hood, it’s essentially a wrapper around the base R function expand.grid, but the function is designed to make the process a bit more convenient when setting up simulation scenarios.

In the first example, suppose there are two parameters: a (with 3 elements) and b (2 elements). In this case, scenario_list will return a list of length \(3 \times 2 = 6\). Each object in the list is a data.table with a single row, where the columns are named for the parameters. An additional column is included to uniquely identify each scenario.

a <- c(0.5, 0.7, 0.9)
b <- c(8, 16)

scenario_list(a, b)
## [[1]]
##        a        b scenario 
##      0.5      8.0      1.0 
## 
## [[2]]
##        a        b scenario 
##      0.7      8.0      2.0 
## 
## [[3]]
##        a        b scenario 
##      0.9      8.0      3.0 
## 
## [[4]]
##        a        b scenario 
##      0.5     16.0      4.0 
## 
## [[5]]
##        a        b scenario 
##      0.7     16.0      5.0 
## 
## [[6]]
##        a        b scenario 
##      0.9     16.0      6.0

If there is a set (or multiple sets) of parameters that will vary together, it is possible to use the new grouped function to link them together. In this example c and d (which are of the same length), are grouped together, and the number of possible scenarios is still \(6\) and not \(3 \times 2 \times 2 = 12\):

If a set (or multiple sets) of parameters need to vary together, the new grouped function can be used to link them. In this example, c and d (which have the same length) are grouped, so the total number of possible scenarios remains \(6\), rather than \(3 \times 2 \times 2 = 12\).

d <- c(12, 18)

scenario_list(a, grouped(b, d))
## [[1]]
##        a        b        d scenario 
##      0.5      8.0     12.0      1.0 
## 
## [[2]]
##        a        b        d scenario 
##      0.7      8.0     12.0      2.0 
## 
## [[3]]
##        a        b        d scenario 
##      0.9      8.0     12.0      3.0 
## 
## [[4]]
##        a        b        d scenario 
##      0.5     16.0     18.0      4.0 
## 
## [[5]]
##        a        b        d scenario 
##      0.7     16.0     18.0      5.0 
## 
## [[6]]
##        a        b        d scenario 
##      0.9     16.0     18.0      6.0

Finally, we can generate multiple replications of each scenario using the each argument. For example, there are four possible combinations of b and d (not grouped), and setting each = 2 creates two replications of each combination:

scenario_list(b, d, each = 2)
## [[1]]
##        b        d scenario 
##        8       12        1 
## 
## [[2]]
##        b        d scenario 
##        8       12        1 
## 
## [[3]]
##        b        d scenario 
##       16       12        2 
## 
## [[4]]
##        b        d scenario 
##       16       12        2 
## 
## [[5]]
##        b        d scenario 
##        8       18        3 
## 
## [[6]]
##        b        d scenario 
##        8       18        3 
## 
## [[7]]
##        b        d scenario 
##       16       18        4 
## 
## [[8]]
##        b        d scenario 
##       16       18        4

Sample simulation using scenarios_list

Here’s a simple example of a simulation for a cluster randomized trial. The goal is to explore how four parameters affect estimated power: site-level sample size (npat), between-site variation (svar), within-site variation (ivar), and effect size (delta).

The code below defines three helper functions:

  • s_define: specifies data definitions for cluster-level and individual-level data,
  • s_generate: generates both site-level and individual-level data,
  • s_model: fits a mixed-effects model.

A fourth function, s_replicate, ties everything together by calling the first three functions using a single set of parameter values:

s_define <- function() {
  
  #--- data definition code ---#
  
  def1 <- defData(varname = "site_eff", 
    formula = 0, variance = "..svar", dist = "normal", id = "site")
  def1 <- defData(def1, "n", formula = "..npat", dist = "poisson")
  
  def2 <- defDataAdd(varname = "Y", formula = "5 + site_eff + ..delta * rx", 
    variance = "..ivar", dist = "normal")
  
  return(list(def1 = def1, def2 = def2)) 
}

s_generate <- function(list_of_defs, argsvec) {
  
  list2env(list_of_defs, envir = environment())
  list2env(as.list(argsvec), envir = environment())
  
  #--- data generation code ---#
  
  ds <- genData(40, def1)
  ds <- trtAssign(ds, grpName = "rx")
  dd <- genCluster(ds, "site", "n", "id")
  dd <- addColumns(def2, dd)
  
  return(dd)
}

s_model <- function(generated_data) {
  
  #--- model code ---#
  
  lmefit <- lmer(Y ~ rx + (1|site), data = generated_data)
 
  return(data.table(tidy(lmefit)))
}

s_replicate <- function(argsvec) {
  
  list_of_defs <- s_define()
  generated_data <- s_generate(list_of_defs, argsvec)
  model_results <- s_model(generated_data)
  
  return(list(argsvec, model_results)) 
}

The four parameters—npat (2 values), svar (2 values), ivar (2 values), and delta (3 values)—are specified as vectors. Because the variance parameters are meant to be tested together, they are grouped. This results in \(3 \times 2 \times 3 = 18\) distinct scenarios. With 1000 replications per scenario, the scenarios list contains 18,000 objects. The object model_fits will then store the model estimates for each replication:

#------ set simulation parameters

npat <- c(8, 16, 24)
svar <- c(0.40, 0.80)
ivar <- c(3, 6)
delta <- c(0.50, 0.75, 1.00)

scenarios <- scenario_list(delta, npat, grouped(svar, ivar), each = 1000)

model_fits <- mclapply(scenarios, function(a) s_replicate(a), mc.cores = 5)

Once the data have been collected, it is quite easy to summarize and create a table or a figure.

summarize <- function(m_fit) {
  args <- data.table(t(m_fit[[1]]))
  reject <- m_fit[[2]][term == "rx", p.value <= 0.05]
  cbind(args, reject)
}

reject <- rbindlist(lapply(model_fits, function(a) summarize(a)))
power <- reject[, .(power = mean(reject)), keyby = .(delta, npat, svar, ivar, scenario)]

To leave a comment for the author, please follow the link and comment on their blog: ouR data generation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)