simstudy 0.6.0 released: more flexible correlation patterns

Posted on February 20, 2023 by Keith Goldfeld in R bloggers | 0 Comments

[This article was first published on ouR data generation, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The new version (0.6.0) of simstudy is available for download from CRAN. In addition to some important bug fixes, I’ve added new functionality that should make data generation with correlated data a little more flexible. In the previous post, I described enhancements to the function genCorMat. As part of this release announcement, I’m describing blockExchangeMat and blockDecayMat, two new functions that can be used to generate correlation matrices when there is a temporal element to the data generation.

This is not an entirely new feature in simstudy. In the previous development version of simstudy, I introduced a function genBlockMat (which I wrote about here). I wasn’t satisfied with the (lack of) flexibility and (lack of) clarity of that function, so I took another approach that ultimately made the final cut of the new release. (Of course, it has always been possible to generate correlated data in simstudy indirectly by using random effects, but here I am focusing on marginal correlation rather than conditional correlation.) I provided the motivation for time-dependent correlation matrices in the earlier post, so I won’t repeat myself here (but if you do read that, don’t pay attention to the function I describe there).

The parameterization of the data generation process implemented in blockExchangeMat and blockDecayMat relies on correlation matrix structures described by Li et al. They’ve created a very useful nomenclature for laying out different patterns of correlation that can occur within a cluster over time. This classification is based on the distinction between cross-sectional vs cohort samples as well as the exchangeable vs. decay patterns of correlation.

Cross-sectional data

In the case where individuals in clusters are “treated” only once over the course of a study, then the sample is considered cross-sectional. (This treatment might be the control condition or the experimental condition.) I am assuming that the outcome is always collected relatively close in time to the treatment – this just simplifies things conceptually. However, it does not necessarily to be the case, as long as the patient is exposed only a single time.

The key point is that in a cross-sectional design, individuals in the same cluster who are treated at different time periods will be unique. The correlation between individual outcomes will be driven entirely by the intra-cluster correlation. The structure of that intra-cluster correlation will depend on the assumption we make about how correlation changes over time: the correlation can reflect either exchangeability or decay.

Exchangeable correlation

Under the assumption of exchangeability, there is a constant within-cluster within-period correlation (\(\rho_w\)) across all study participants in the same cluster and period. For participants in the same cluster but different period, the within-cluster between-period correlation (\(\rho_b\)) is different from \(\rho_w\) (presumably lower) but constant over time.

A matrix that includes three periods and two individuals per period is shown below. This represents the correlation structure for a single cluster. Each box represents a different time period. So, the correlation represented in the box in the upper left hand corner is the within-cluster within-period correlation for the first period. The bottom left box represents the within-cluster between-period correlation for the individuals in the first and third periods. (Note that we are assuming that individuals in different clusters are not correlated with each other.)

\scriptsize{ R = \left ( \begin{array}{c|c|c} \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} \rho_b & \rho_b \\ \rho_b & \rho_b \end{matrix} & \begin{matrix} \rho_b & \rho_b \\ \rho_b & \rho_b \end{matrix} \\ \hline \begin{matrix} \rho_b & \rho_b \\ \rho_b & \rho_b \end{matrix} & \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} \rho_b & \rho_b \\ \rho_b & \rho_b \end{matrix} \\ \hline \begin{matrix} \rho_b & \rho_b \\ \rho_b & \rho_b \end{matrix} & \begin{matrix} \rho_b & \rho_b \\ \rho_b & \rho_b \end{matrix} & \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} \end{array} \right ) }

Decaying correlation

Under the assumption of decay, the within-cluster within-period correlation (\(\rho_w\)) is the same as under the exchangeability assumptions. The between-period correlation is now a function of the difference in time when the two individuals were treated. The correlation is \(\rho_w * r^{|s-t|}\), where \(r\) is a decay parameter between 0 and 1, and \(s\) and \(t\) are the periods under consideration. For example, in the lower left-hand box, we have the correlation between individuals in the first period (\(s=1\)) and individuals in the third period (\(t=3\)), which gives a correlation coefficient of \(\rho_w \times r^{|1-3|} = \rho_w \times r^2\). As the difference in periods grows, \(r^{|s-t|}\) (and therefore, correlation) gets smaller.

\scriptsize{ R = \left ( \begin{array}{c|c|c} \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} \rho_w r & \rho_w r \\ \rho_w r & \rho_w r \end{matrix} & \begin{matrix} \rho_w r^2 & \rho_w r^2 \\ \rho_w r^2 & \rho_w r^2 \end{matrix} \\ \hline \begin{matrix} \rho_w r & \rho_w r \\ \rho_w r & \rho_w r \end{matrix}& \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} \rho_w r & \rho_w r \\ \rho_w r & \rho_w r \end{matrix} \\ \hline \begin{matrix} \rho_w r^2 & \rho_w r^2 \\ \rho_w r^2 & \rho_w r^2 \end{matrix} & \begin{matrix} \rho_w r & \rho_w r \\ \rho_w r & \rho_w r \end{matrix} & \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} \end{array} \right ) }

Closed cohort

When individuals in clusters are “treated” or exposed in each period of a study, the sample is considered to be a cohort. If every individual is measured in each period, as I’ve just described, this would be a closed cohort; it is closed in the sense that once the cohort is defined at the beginning of the study, no new participants are added. If we allow participants to start and stop and random points, this would be an open cohort design. For the purposes of simulation it is challenging to generate data under an open cohort design with this marginal approach (using correlation matrices), and is much easier to do with random effects (which I did here). Everything I describe here applies to closed cohorts only.

Exchangeable

The key difference between the cross-sectional and cohort design is the within-individual between-period (auto) correlation. Under the exchangeable assumption, the autocorrelation is specified with the correlation coefficient \(\rho_a\). The within-period between-individual correlation is still \(\rho_w\), and the between-period between-individual correlation is still \(\rho_b\). All of these correlations remain constant in the exchangeable framework:

\scriptsize{ R = \left ( \begin{array}{c|c|c} \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} \rho_a & \rho_b \\ \rho_b & \rho_a \end{matrix} & \begin{matrix} \rho_a & \rho_b \\ \rho_b & \rho_a \end{matrix} \\ \hline \begin{matrix} \rho_a & \rho_b \\ \rho_b & \rho_a \end{matrix} & \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} \rho_a & \rho_b \\ \rho_b & \rho_a \end{matrix} \\ \hline \begin{matrix} \rho_a & \rho_b \\ \rho_b & \rho_a \end{matrix} & \begin{matrix} \rho_a & \rho_b \\ \rho_b & \rho_a \end{matrix} & \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} \end{array} \right ) }

Decay

The decay structure under an assumption of a closed cohort is the last of the four possible variations. The within-period between-individual correlation \(\rho_w\) remains the same, and so does the between-period between-individual correlation \(\rho_wr^{|s-t|}\). The between-period within-individual correlation is specified as \(r^{|s-t|}\):

\scriptsize{ R = \left ( \begin{array}{c|c|c} \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} r & \rho_w r \\ \rho_w r & r \end{matrix} & \begin{matrix} r^2 & \rho_w r^2 \\ \rho_w r^2 & r^2 \end{matrix} \\ \hline \begin{matrix} r & \rho_w r \\ \rho_w r & r \end{matrix}& \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} & \begin{matrix} r & \rho_w r \\ \rho_w r & r \end{matrix} \\ \hline \begin{matrix} r^2 & \rho_w r^2 \\ \rho_w r^2 & r^2 \end{matrix} & \begin{matrix} r & \rho_w r \\ \rho_w r & r \end{matrix} & \begin{matrix} 1 & \rho_w \\ \rho_w & 1 \end{matrix} \end{array} \right ) }

Generating block matrices and simulating data

We can put all of this into action with the help of two new functions blockExchangeMat and blockDecayMat. In the simulations that follow, I will start with two clusters and three periods, just to keep it simple. In the addendum, I will provide a more elaborate example.

Cross-sectional data with exchangeable correlation

In the first example, we specify \(\rho_w = 0.5\) and \(\rho_b = 0.3\), and there will be two individuals per cluster per period, so a total of six individuals per cluster. Since the assumptions are the same for both clusters, it is only necessary to generate a single correlation matrix that will apply to each cluster (as opposed to a list of clusters, which I will generate in the more involved example at the end).

library(simstudy)
library(data.table)

R_XE <- blockExchangeMat(ninds = 2 , nperiods = 3, rho_w = 0.5,
  rho_b = 0.3, pattern = "xsection")

R_XE
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]  1.0  0.5  0.3  0.3  0.3  0.3
## [2,]  0.5  1.0  0.3  0.3  0.3  0.3
## [3,]  0.3  0.3  1.0  0.5  0.3  0.3
## [4,]  0.3  0.3  0.5  1.0  0.3  0.3
## [5,]  0.3  0.3  0.3  0.3  1.0  0.5
## [6,]  0.3  0.3  0.3  0.3  0.5  1.0

The code block that follows generates two clusters (with different mean values for the outcome for each cluster), and then 3 periods for each cluster, and finally 2 individuals per each period (per cluster). The correlated data are generated in the last step, using the correlation matrix \(R_{XE}\).

defC <- defData(varname = "lambda", formula = "c(4, 7)", dist = "nonrandom")

set.seed(1234)

dc <- genData(2, defC, id = "site")
dp <- addPeriods(dc, 3, idvars = "site")
dx <- genCluster(dtClust = dp, cLevelVar = "timeID", 
                 numIndsVar = 2, level1ID = "id")

dd <- addCorGen(dx, idvar = "site", corMatrix = R_XE,
  dist = "poisson", param1 = "lambda", cnames = "y")

dd
##     site period lambda timeID id y
##  1:    1      0      4      1  1 4
##  2:    1      0      4      1  2 2
##  3:    1      1      4      2  3 2
##  4:    1      1      4      2  4 4
##  5:    1      2      4      3  5 2
##  6:    1      2      4      3  6 2
##  7:    2      0      7      4  7 3
##  8:    2      0      7      4  8 5
##  9:    2      1      7      5  9 2
## 10:    2      1      7      5 10 4
## 11:    2      2      7      6 11 5
## 12:    2      2      7      6 12 5

The next function generates 5000 data sets for these 12 individuals, so that we can estimate an empirical correlation matrix that we can compare with the true correlation matrix. In this case, it looks like things have worked out quite well.

replicate <- function(R, dx) {
  reps <- lapply(1:5000, function(x)
    addCorGen(dx, idvar = "site", corMatrix = R,
      dist = "poisson", param1 = "lambda", cnames = "y")
  )

  drep <- data.table::rbindlist(reps, idcol = "rep")
  drep[, seq := 1:.N, keyby = rep]
  dmat <- as.matrix(dcast(drep, rep ~ seq, value.var = "y")[, -1])
  round(cor(dmat), 1) 
}

replicate(R = R_XE, dx = dx)
##      1   2   3   4   5   6   7   8   9  10  11  12
## 1  1.0 0.5 0.3 0.3 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 2  0.5 1.0 0.3 0.3 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 3  0.3 0.3 1.0 0.5 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 4  0.3 0.3 0.5 1.0 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 5  0.3 0.3 0.3 0.3 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0
## 6  0.3 0.3 0.3 0.3 0.5 1.0 0.0 0.0 0.0 0.0 0.0 0.0
## 7  0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.5 0.3 0.3 0.3 0.3
## 8  0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 0.3 0.3 0.3 0.3
## 9  0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 1.0 0.5 0.3 0.3
## 10 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 0.5 1.0 0.3 0.3
## 11 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 0.3 0.3 1.0 0.5
## 12 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 0.3 0.3 0.5 1.0

Cross-sectional data with correlation decay

We repeat these steps, but now with a decay parameter \(r = 0.8\) and no parameter \(\rho_b\).

R_XD <- blockDecayMat(ninds = 2 , nperiods = 3, rho_w = 0.5,
  r = 0.8, pattern = "xsection")

R_XD
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.50  0.4  0.4 0.32 0.32
## [2,] 0.50 1.00  0.4  0.4 0.32 0.32
## [3,] 0.40 0.40  1.0  0.5 0.40 0.40
## [4,] 0.40 0.40  0.5  1.0 0.40 0.40
## [5,] 0.32 0.32  0.4  0.4 1.00 0.50
## [6,] 0.32 0.32  0.4  0.4 0.50 1.00

The empirical correlation matches the matrix \(R_{XD}\):

replicate(R = R_XD, dx = dx)
##      1   2   3   4   5   6   7   8   9  10  11  12
## 1  1.0 0.5 0.4 0.4 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 2  0.5 1.0 0.4 0.4 0.3 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 3  0.4 0.4 1.0 0.5 0.4 0.4 0.0 0.0 0.0 0.0 0.0 0.0
## 4  0.4 0.4 0.5 1.0 0.4 0.4 0.0 0.0 0.0 0.0 0.0 0.0
## 5  0.3 0.3 0.4 0.4 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0
## 6  0.3 0.3 0.4 0.4 0.5 1.0 0.0 0.0 0.0 0.0 0.0 0.0
## 7  0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.5 0.4 0.4 0.3 0.3
## 8  0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 0.4 0.4 0.3 0.3
## 9  0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.4 1.0 0.5 0.4 0.4
## 10 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.4 0.5 1.0 0.4 0.4
## 11 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 0.4 0.4 1.0 0.5
## 12 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.3 0.4 0.4 0.5 1.0

Cohort data with exchangeable correlation

Since we have a cohort, we introduce \(\rho_a\) = 0.4, and specify \(pattern = \text{"cohort"}\):

R_CE <- blockExchangeMat(ninds = 2 , nperiods = 3, rho_w = 0.5, 
  rho_b = 0.3, rho_a = 0.4, pattern = "cohort")

R_CE
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,]  1.0  0.5  0.4  0.3  0.4  0.3
## [2,]  0.5  1.0  0.3  0.4  0.3  0.4
## [3,]  0.4  0.3  1.0  0.5  0.4  0.3
## [4,]  0.3  0.4  0.5  1.0  0.3  0.4
## [5,]  0.4  0.3  0.4  0.3  1.0  0.5
## [6,]  0.3  0.4  0.3  0.4  0.5  1.0

And here is the empirical correlation matrix:

replicate(R = R_CE, dx = dx)
##      1   2   3   4   5   6   7   8   9  10  11  12
## 1  1.0 0.5 0.4 0.3 0.4 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 2  0.5 1.0 0.3 0.4 0.3 0.4 0.0 0.0 0.0 0.0 0.0 0.0
## 3  0.4 0.3 1.0 0.5 0.4 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 4  0.3 0.4 0.5 1.0 0.3 0.4 0.0 0.0 0.0 0.0 0.0 0.0
## 5  0.4 0.3 0.4 0.3 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0
## 6  0.3 0.4 0.3 0.4 0.5 1.0 0.0 0.0 0.0 0.0 0.0 0.0
## 7  0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.5 0.4 0.3 0.4 0.3
## 8  0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 0.3 0.4 0.3 0.4
## 9  0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.3 1.0 0.5 0.4 0.3
## 10 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.4 0.5 1.0 0.3 0.4
## 11 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.3 0.4 0.3 1.0 0.5
## 12 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.4 0.3 0.4 0.5 1.0

Cohort data with correlation decay

In the final case, the parameterization for decaying correlation with a cohort is the same as a decay in the case of a cross sectional design; the only difference that we set \(pattern = \text{"cohort"}\):

R_CD <- blockDecayMat(ninds = 2 , nperiods = 3, rho_w = 0.5, 
  r = 0.8, pattern = "cohort")

R_CD
##      [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1.00 0.50  0.8  0.4 0.64 0.32
## [2,] 0.50 1.00  0.4  0.8 0.32 0.64
## [3,] 0.80 0.40  1.0  0.5 0.80 0.40
## [4,] 0.40 0.80  0.5  1.0 0.40 0.80
## [5,] 0.64 0.32  0.8  0.4 1.00 0.50
## [6,] 0.32 0.64  0.4  0.8 0.50 1.00

And in this final case, the empirical data set also does quite well:

replicate(R = R_CD, dx = dx)
##      1   2   3   4   5   6   7   8   9  10  11  12
## 1  1.0 0.5 0.8 0.4 0.6 0.3 0.0 0.0 0.0 0.0 0.0 0.0
## 2  0.5 1.0 0.4 0.8 0.3 0.6 0.0 0.0 0.0 0.0 0.0 0.0
## 3  0.8 0.4 1.0 0.5 0.8 0.4 0.0 0.0 0.0 0.0 0.0 0.0
## 4  0.4 0.8 0.5 1.0 0.4 0.8 0.0 0.0 0.0 0.0 0.0 0.0
## 5  0.6 0.3 0.8 0.4 1.0 0.5 0.0 0.0 0.0 0.0 0.0 0.0
## 6  0.3 0.6 0.4 0.8 0.5 1.0 0.0 0.0 0.0 0.0 0.0 0.0
## 7  0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.5 0.8 0.4 0.6 0.3
## 8  0.0 0.0 0.0 0.0 0.0 0.0 0.5 1.0 0.4 0.8 0.3 0.6
## 9  0.0 0.0 0.0 0.0 0.0 0.0 0.8 0.4 1.0 0.5 0.8 0.4
## 10 0.0 0.0 0.0 0.0 0.0 0.0 0.4 0.8 0.5 1.0 0.4 0.8
## 11 0.0 0.0 0.0 0.0 0.0 0.0 0.6 0.3 0.8 0.4 1.0 0.5
## 12 0.0 0.0 0.0 0.0 0.0 0.0 0.3 0.6 0.4 0.8 0.5 1.0

Reference:

Li, Fan, James P. Hughes, Karla Hemming, Monica Taljaard, Edward R. Melnick, and Patrick J. Heagerty. “Mixed-effects models for the design and analysis of stepped wedge cluster randomized trials: an overview.” Statistical Methods in Medical Research 30, no. 2 (2021): 612-639.

Addendum

I want to provide a slightly more elaborate example to show the true flexibility of the new functions. In the case of a cross-sectional design, the number of observations per period for a specific cluster does not need to remain constant (though in the case of data generation under a cohort design it does). So, we can vary the total number of observations as well as the correlation parameters by cluster.

In this example, there are 10 clusters and three periods. The number of individuals per cluster per period is randomly generated, and ranges from two to four. The decay rate \(r\) varies by cluster (and is generated using the beta distribution with shape parameters 6 and 2). The parameter \(\rho_w\) is constant across all clusters, and is 0.6

defC <- defData(varname = "lambda", formula = "sample(5:10, 1)", dist = "nonrandom")
defP <- defDataAdd(varname = "n", formula = "2;4", dist="uniformInt")

dc <- genData(n = 10, dtDefs = defC, id = "site")
dc <- addPeriods(dtName = dc, nPeriods = 3, 
                 idvars = "site", perName = "period")
dc <- addColumns(defP, dc)

dd <- genCluster(dtClust = dc, cLevelVar = "timeID", 
  numIndsVar = "n", level1ID = "id")

Here are the counts for three sites:

dc[site %in% c(1, 3, 7), .(site, period, n)]
##    site period n
## 1:    1      0 4
## 2:    1      1 4
## 3:    1      2 4
## 4:    3      0 2
## 5:    3      1 4
## 6:    3      2 2
## 7:    7      0 4
## 8:    7      1 4
## 9:    7      2 2

And here are the unique decay rates for the same sites:

r <- round(rbeta(10, 6, 2), 2)
r[c(1, 3, 7)]
## [1] 0.82 0.80 0.93

And finally, here are the correlation matrices for these three sites:

N <- dd[, .N, keyby = .(site, period)][, N]

R <- blockDecayMat(ninds = N , nperiods = 3, rho_w = 0.6, r = r, nclusters = 10)

lapply(R, function(x) round(x,2))[c(1, 3, 7)]
## [[1]]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
##  [1,] 1.00 0.60 0.60 0.60 0.49 0.49 0.49 0.49 0.40  0.40  0.40  0.40
##  [2,] 0.60 1.00 0.60 0.60 0.49 0.49 0.49 0.49 0.40  0.40  0.40  0.40
##  [3,] 0.60 0.60 1.00 0.60 0.49 0.49 0.49 0.49 0.40  0.40  0.40  0.40
##  [4,] 0.60 0.60 0.60 1.00 0.49 0.49 0.49 0.49 0.40  0.40  0.40  0.40
##  [5,] 0.49 0.49 0.49 0.49 1.00 0.60 0.60 0.60 0.49  0.49  0.49  0.49
##  [6,] 0.49 0.49 0.49 0.49 0.60 1.00 0.60 0.60 0.49  0.49  0.49  0.49
##  [7,] 0.49 0.49 0.49 0.49 0.60 0.60 1.00 0.60 0.49  0.49  0.49  0.49
##  [8,] 0.49 0.49 0.49 0.49 0.60 0.60 0.60 1.00 0.49  0.49  0.49  0.49
##  [9,] 0.40 0.40 0.40 0.40 0.49 0.49 0.49 0.49 1.00  0.60  0.60  0.60
## [10,] 0.40 0.40 0.40 0.40 0.49 0.49 0.49 0.49 0.60  1.00  0.60  0.60
## [11,] 0.40 0.40 0.40 0.40 0.49 0.49 0.49 0.49 0.60  0.60  1.00  0.60
## [12,] 0.40 0.40 0.40 0.40 0.49 0.49 0.49 0.49 0.60  0.60  0.60  1.00
## 
## [[2]]
##      [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
## [1,] 1.00 0.60 0.48 0.48 0.48 0.48 0.38 0.38
## [2,] 0.60 1.00 0.48 0.48 0.48 0.48 0.38 0.38
## [3,] 0.48 0.48 1.00 0.60 0.60 0.60 0.48 0.48
## [4,] 0.48 0.48 0.60 1.00 0.60 0.60 0.48 0.48
## [5,] 0.48 0.48 0.60 0.60 1.00 0.60 0.48 0.48
## [6,] 0.48 0.48 0.60 0.60 0.60 1.00 0.48 0.48
## [7,] 0.38 0.38 0.48 0.48 0.48 0.48 1.00 0.60
## [8,] 0.38 0.38 0.48 0.48 0.48 0.48 0.60 1.00
## 
## [[3]]
##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,] 1.00 0.60 0.60 0.60 0.56 0.56 0.56 0.56 0.52  0.52
##  [2,] 0.60 1.00 0.60 0.60 0.56 0.56 0.56 0.56 0.52  0.52
##  [3,] 0.60 0.60 1.00 0.60 0.56 0.56 0.56 0.56 0.52  0.52
##  [4,] 0.60 0.60 0.60 1.00 0.56 0.56 0.56 0.56 0.52  0.52
##  [5,] 0.56 0.56 0.56 0.56 1.00 0.60 0.60 0.60 0.56  0.56
##  [6,] 0.56 0.56 0.56 0.56 0.60 1.00 0.60 0.60 0.56  0.56
##  [7,] 0.56 0.56 0.56 0.56 0.60 0.60 1.00 0.60 0.56  0.56
##  [8,] 0.56 0.56 0.56 0.56 0.60 0.60 0.60 1.00 0.56  0.56
##  [9,] 0.52 0.52 0.52 0.52 0.56 0.56 0.56 0.56 1.00  0.60
## [10,] 0.52 0.52 0.52 0.52 0.56 0.56 0.56 0.56 0.60  1.00

And here are the empirical correlation matrices for the three sites:

reps <- lapply(1:5000, 
  function(x) addCorGen(dd, idvar = "site", corMatrix = R,
    dist = "poisson", param1 = "lambda", cnames = "y")
)

drep <- data.table::rbindlist(reps, idcol = "rep")

empir_corr <- function(cluster) {
  dcrep <- drep[site == cluster, ]
  dcrep[, seq := 1:.N, keyby = rep]
  dmat <- as.matrix(dcast(dcrep, rep ~ seq, value.var = "y")[, -1])
  
  return(round(cor(dmat), 2))
}
empir_corr(cluster = 1)
##       1    2    3    4    5    6    7    8    9   10   11   12
## 1  1.00 0.59 0.60 0.60 0.48 0.49 0.49 0.49 0.37 0.37 0.37 0.39
## 2  0.59 1.00 0.59 0.59 0.50 0.49 0.49 0.50 0.38 0.37 0.39 0.39
## 3  0.60 0.59 1.00 0.58 0.49 0.47 0.49 0.49 0.37 0.37 0.37 0.40
## 4  0.60 0.59 0.58 1.00 0.50 0.48 0.48 0.49 0.39 0.39 0.39 0.41
## 5  0.48 0.50 0.49 0.50 1.00 0.59 0.58 0.60 0.49 0.49 0.48 0.49
## 6  0.49 0.49 0.47 0.48 0.59 1.00 0.59 0.60 0.47 0.47 0.46 0.49
## 7  0.49 0.49 0.49 0.48 0.58 0.59 1.00 0.59 0.48 0.48 0.47 0.50
## 8  0.49 0.50 0.49 0.49 0.60 0.60 0.59 1.00 0.48 0.47 0.47 0.49
## 9  0.37 0.38 0.37 0.39 0.49 0.47 0.48 0.48 1.00 0.58 0.57 0.58
## 10 0.37 0.37 0.37 0.39 0.49 0.47 0.48 0.47 0.58 1.00 0.59 0.58
## 11 0.37 0.39 0.37 0.39 0.48 0.46 0.47 0.47 0.57 0.59 1.00 0.59
## 12 0.39 0.39 0.40 0.41 0.49 0.49 0.50 0.49 0.58 0.58 0.59 1.00
empir_corr(cluster = 3)
##      1    2    3    4    5    6    7    8
## 1 1.00 0.59 0.47 0.46 0.48 0.47 0.38 0.39
## 2 0.59 1.00 0.48 0.46 0.49 0.47 0.37 0.40
## 3 0.47 0.48 1.00 0.60 0.61 0.60 0.48 0.49
## 4 0.46 0.46 0.60 1.00 0.60 0.60 0.48 0.49
## 5 0.48 0.49 0.61 0.60 1.00 0.60 0.49 0.49
## 6 0.47 0.47 0.60 0.60 0.60 1.00 0.49 0.49
## 7 0.38 0.37 0.48 0.48 0.49 0.49 1.00 0.59
## 8 0.39 0.40 0.49 0.49 0.49 0.49 0.59 1.00
empir_corr(cluster = 7)
##       1    2    3    4    5    6    7    8    9   10
## 1  1.00 0.60 0.59 0.62 0.57 0.57 0.56 0.57 0.54 0.52
## 2  0.60 1.00 0.59 0.60 0.56 0.56 0.55 0.55 0.51 0.51
## 3  0.59 0.59 1.00 0.61 0.56 0.57 0.57 0.57 0.51 0.52
## 4  0.62 0.60 0.61 1.00 0.56 0.57 0.57 0.57 0.51 0.53
## 5  0.57 0.56 0.56 0.56 1.00 0.61 0.61 0.60 0.55 0.56
## 6  0.57 0.56 0.57 0.57 0.61 1.00 0.61 0.60 0.56 0.56
## 7  0.56 0.55 0.57 0.57 0.61 0.61 1.00 0.61 0.55 0.56
## 8  0.57 0.55 0.57 0.57 0.60 0.60 0.61 1.00 0.55 0.57
## 9  0.54 0.51 0.51 0.51 0.55 0.56 0.55 0.55 1.00 0.58
## 10 0.52 0.51 0.52 0.53 0.56 0.56 0.56 0.57 0.58 1.00

To leave a comment for the author, please follow the link and comment on their blog: ouR data generation.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

simstudy 0.6.0 released: more flexible correlation patterns

Cross-sectional data

Exchangeable correlation

Decaying correlation

Closed cohort

Exchangeable

Decay

Generating block matrices and simulating data

Cross-sectional data with exchangeable correlation

Cross-sectional data with correlation decay

Cohort data with exchangeable correlation

Cohort data with correlation decay

Addendum

Related

Cross-sectional data

Exchangeable correlation

Decaying correlation

Closed cohort

Exchangeable

Decay

Generating block matrices and simulating data

Cross-sectional data with exchangeable correlation

Cross-sectional data with correlation decay

Cohort data with exchangeable correlation

Cohort data with correlation decay

Addendum

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)