What Makes a Sensitivity Analysis?

Posted on December 11, 2020 by Statistical Science & Related Matters on Less Likely in R bloggers | 0 Comments

[This article was first published on Statistical Science & Related Matters on Less Likely, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Note: This discussion does not cover bias analysis as employed in epidemiological studies. For extensive discussions on that topic, see the following papers.^1–7

Sensitivity analyses (SA) are common in trials and observational studies, but often little thought is given to what they should entail. None of this is surprising given that they are not usually taught in traditional settings, although historically, statistical concepts taught in traditional settings don’t have a great track record for proper application and interpretation. Regardless, they (SA) are an important component of many statistical analyses, and are therefore worth carefully understanding.

The first word of the phrase is clearly an obvious tell, it suggests an analysis that examines how sensitive or robust a result is. But a natural question is, sensitive specifically to what? Can I vary anything in the analysis to test the sensitivity of the result? For example, if I obtained an odds ratio of 1.7 from the primary analysis in my experiment and I conducted an extra analysis to see whether the odds ratio would change by resampling, is that a sensitivity analysis?

What if I did a completely different type of analysis, for example, if the primary analysis was a classical statistical test with an adjustment for multiple comparisons and I decided to run a hierarchical Bayesian regression with a concentrated prior on the null, would that be a sensitivity analysis? The possibilities of what can be varied are endless when a loose definition of sensitivity analysis is assumed. This is unique to every discipline and their culture.

Frequent Misconceptions

It is very common in clinical trials, in particular, to see primary analyses often being intent-to-treat (ITT) analyses and sensitivity analyses being per-protocol analyses (PP). The following is a similar example from a high-profile trial published in the The Lancet in 2002,⁸

The Multicentre Aneurysm Screening Study group randomised 67,800 men to receive an invitation to an abdominal ultrasound scan or not [6]. Of those invited to receive an abdominal scan, 20% did not accept. The primary analysis was by intention to treat, thus estimating the effect of being randomised to abdominal ultrasound. Another analysis investigated the complier average causal effect, which considers what the (average) effect of treatment was in patients who would have adhered to protocol however they were randomised [7].

To many, this may seem perfectly fine, even great. The authors used the question they were primarily interested in as the main analysis, and then conducted an additional analysis to see if these results are consistent. Unfortunately, this is highly problematic as Morris et al. (2014)⁹ describes

These questions are different, and observing different results should not shake our confidence in either. The CACE analysis was a secondary analysis, not a sensitivity analysis.

It is common for authors to compare the results of intention-to-treat with per-protocol analysis; see for example [8, 9]. While it is hard to pin down the precise question of per-protocol analysis [10], this is clearly different to the question intention-to-treat addresses. Per-protocol analysis should not therefore be considered as a sensitivity analysis for intention-to-treat but as a secondary analysis, if at all.

As mentioned above, there are many possible ways to conduct a sensitivity analysis, especially if the phrase is used in a loose/vague way. Luckily, many of us do not have to get into nonsensical philosophical arguments on Twitter about this because the International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Human Use (ICH) has given this topic much thought and it’s reflected by the fact that they recently created an entire new addition to their E9 guidance document (Statistical Principles for Clinical Trials) which has served as a reference for clinical trial statisticians for decades. In the addendum, titled Addendum on Estimands and Sensitivity Analysis In Clinical Trials, which has now been legally adopted by regulatory agencies around the world including the FDA and EMA, they elaborate on the concept of estimands and the role sensitivity analyses play.

To be clear, estimands are not new and have been discussed in the statistical literature since Tukey¹⁰ but the ICH working group’s addendum formalized its adoption in the clinical trial world and the importance of sensitivity analyses.^11–13

Estimands & Sensitivity

A.5.2.1. Role of Sensitivity Analysis

Inferences based on a particular estimand should be robust to limitations in the data and deviations from the assumptions used in the statistical model for the main estimator. This robustness is evaluated through a sensitivity analysis. Sensitivity analysis should be planned for the main estimators of all estimands that will be important for regulatory decision making and labelling in the product information. This can be a topic for discussion and agreement between sponsor and regulator.

The statistical assumptions that underpin the main estimator should be documented. One or more analyses, focused on the same estimand, should then be pre-specified to investigate these assumptions with the objective of verifying whether or not the estimate derived from the main estimator is robust to departures from its assumptions. This might be characterised as the extent of departures from assumptions that change the interpretation of the results in terms of their statistical or clinical significance (e.g. tipping point analysis).

Distinct from sensitivity analysis, where investigations are conducted with the intent of exploring robustness of departures from assumptions, other analyses that are conducted in order to more fully investigate and understand the trial data can be termed “supplementary analysis” (see Glossary; A.5.3.). Where the primary estimand(s) of interest is agreed between sponsor and regulator, the main estimator is pre-specified unambiguously, and the sensitivity analysis verifies that the estimate derived is reliable for interpretation, supplementary analyses should generally be given lower priority in assessment.

A.5.2.2. Choice of Sensitivity Analysis

When planning and conducting a sensitivity analysis, altering multiple aspects of the main analysis simultaneously can make it challenging to identify which assumptions, if any, are responsible for any potential differences seen. It is therefore desirable to adopt a structured approach, specifying the changes in assumptions that underlie the alternative analyses, rather than simply comparing the results of different analyses based on different sets of assumptions. The need for analyses varying multiple assumptions simultaneously should then be considered on a case by case basis. A distinction between testable and untestable assumptions may be useful when assessing the interpretation and relevance of different analyses.

ICH E9 (R1) Guideline

The need for sensitivity analysis in respect of missing data is established and retains its importance in this framework. Missing data should be defined and considered in respect of a particular estimand (see A.4.). The distinction between data that are missing in respect of a specific estimand and data that are not directly relevant to a specific estimand gives rise to separate sets of assumptions to be examined in sensitivity analysis.

they explicitly define a sensitivity analysis as being an analysis which realistically varies the assumptions from the primary analysis, still targets the same estimand, examines the robustness of the results to assumption violations, and can possibly change the results/conclusions drawn. They contrast this with more extensive analyses that investigate these violations/departures of assumptions, and characterize them as being supplementary rather than sensitivity.

Similar views are echoed by the National Research Council’s advice on clinical trials (National Research Council 2010)¹⁴

Recommendation 15: Sensitivity analysis should be part of the primary reporting of findings from clinical trials. Examining sensitivity to the assumptions about the missing data mechanism should be a mandatory component of reporting.

Now, back to the NEJM paper. The reason why an ITT analysis and a PP analysis cannot serve as primary and sensitivity analyses, respectively, is because they are targeting entirely different estimands.⁹ Thus, because they are answering completely different questions, they’re just two different analyses. And the additional PP analysis, although important to certain stakeholders, can be classified as a supplementary analysis or a non-sensitivity analysis.

Indeed, the following flowchart from Morris et al. (2014)⁹ is particularly useful as a guide to differentiate sensitivity analyses from non-sensitivity analyses.

Adapted flowchart from Morris et al. (2014)

So what does a sensitivity analysis actually look like? Below, I walk through the analysis of a trial to give some examples.

An Example From a Trial

I’ll use a sample clinical trial dataset, which is described here:

“This dataset is typical of diastolic blood pressure data measured in small clinical trials in hypertension from the mid-to-late 1960s and for approximately a decade thereafter.

During this time, hypertension was more severe, the number of effective treatments was relatively small, and the definition (DBP > 95 mmHg) of essential hypertension was not as stringent as it is now (DBP > 80 mmHg) – as seen in the 1967 report from the Veterans Administration Cooperative Study Group on Antihypertensive Agents VA Study Group (1967).

In Table 3.1, diastolic blood pressure (DBP) was measured (mmHg) in the supine position at baseline (i.e., DBP) before randomization… Patients’ age and sex were recorded at baseline and represent potential covariates. The primary objective in this chapter in the analysis of this dataset is to test whether treatment A (new drug) may be effective in lowering DBP as compared to B (placebo) and to describe changes in DBP across the times at which it was measured.”

Although we will not be using some of the scripts and project setup that I discussed in my statistical workflow post for the sake of efficiency and ease, I would urge and recommend others do so that it remains easy to catch errors. In particular, I would especially recommend using the following packages:

here

futile.logger

tryCatchLog

# Load Data  -------------------------------------
RNGkind("L'Ecuyer-CMRG")
set.seed(1031)
R1 <- rnorm(500, 0, 1)
R2 <- rnorm(500, 1, 0.5)
BaselineDBP <- rnorm(500, 117, 2)
Group <- rbinom(500, 1, 0.5)
Age <- rnorm(500, 50, 7)
errors <- rmvnorm(500, mean = c(0.5, 0.9),
                  sigma = matrix(c(1, 0.3, 0.3, 1),
                                 nrow = 2, byrow = TRUE))
PostDBP <- BaselineDBP + (Age * 0.33) + 
           (Group * 9.7) + errors[, 1]

I have made some minor adjustments to this dataset and also generated a new variable (Z) based on the existing ones in the dataset that is strongly correlated with the outcome of interest (PostDBP).

# Constructing Covariates  -------------------------------------
n     <- length(BaselineDBP)   # length of vector
rho   <- 0.7                   # desired correlation
theta <- acos(rho)             # corresponding angle
x1    <- BaselineDBP           # fixed given data
x2    <- rnorm(n, 60, 10)      # new random data
X     <- cbind(x1, x2)         # matrix
Xctr  <- scale(X, center = TRUE,
               scale = FALSE)                 # centered columns (mean 0)
Id   <- diag(n)                               # identity matrix
Q    <- qr.Q(qr(Xctr[ , 1, drop = FALSE]))    # QR-decomposition
P    <- tcrossprod(Q)          # = Q Q'       # projection onto space by x1
x2o  <- (Id-P) %*% Xctr[ , 2]                 # x2ctr orthogonal to x1ctr
Xc2  <- cbind(Xctr[ , 1], x2o)                # bind to matrix
Y1    <- Xc2 %*% diag(1/sqrt(colSums(Xc2^2))) # scale columns to length 1
Z <- Y1[ , 2] + (1 / tan(theta)) * Y1[ , 1]   # final new vector
BP_full <- data.frame(PostDBP, BaselineDBP,
                      Group, Age, Z, R1, R2)

We examine the correlation matrix of the dataset and take a quick look at some of the values.

# Explore Data  -------------------------------------
summary(BP_full)
#>     PostDBP       BaselineDBP        Group            Age              Z                   R1                 R2         
#>  Min.   :126.0   Min.   :110.6   Min.   :0.000   Min.   :26.96   Min.   :-0.209414   Min.   :-3.56828   Min.   :-0.6788  
#>  1st Qu.:133.6   1st Qu.:115.6   1st Qu.:0.000   1st Qu.:45.03   1st Qu.:-0.044121   1st Qu.:-0.71786   1st Qu.: 0.6720  
#>  Median :139.1   Median :116.9   Median :1.000   Median :49.44   Median :-0.005169   Median :-0.06014   Median : 0.9743  
#>  Mean   :138.6   Mean   :117.0   Mean   :0.508   Mean   :49.47   Mean   : 0.000000   Mean   :-0.07153   Mean   : 0.9799  
#>  3rd Qu.:143.3   3rd Qu.:118.3   3rd Qu.:1.000   3rd Qu.:53.96   3rd Qu.: 0.044123   3rd Qu.: 0.60845   3rd Qu.: 1.3100  
#>  Max.   :152.0   Max.   :123.8   Max.   :1.000   Max.   :70.73   Max.   : 0.247114   Max.   : 3.03686   Max.   : 2.3561
ztable(head(BP_full, 10))

PostDBP	BaselineDBP	Group	Age	Z	R1	R2
133.86	120.79	0	42.31	0.05	1.23	1.85
129.82	115.61	0	40.90	-0.07	-0.72	0.80
142.72	116.77	1	48.55	-0.08	-0.80	-0.23
136.99	118.09	0	53.17	0.02	0.87	1.22
134.46	116.71	0	52.33	0.00	0.88	1.21
142.12	117.87	1	46.82	0.00	0.64	1.01
131.46	118.64	0	40.77	0.03	0.04	0.99
141.82	116.93	1	49.44	-0.04	-0.48	1.07
139.74	119.06	0	60.44	0.10	0.61	0.55
142.61	117.88	0	70.73	-0.01	0.93	0.99

Now, we can quickly explore the other characteristics of the dataset.

# Inspect Data  -------------------------------------

study_formula <- as.formula(PostDBP ~ .)

plot(spearman2(study_formula, BP_full), pch = 19,
     cex = 1,  col = alpha("#d46c5b", 1.95))
abline(v = 0, col = alpha("#d46c5b", 0.75), lty = 3)

# Initial Data Analysis  -------------------------------------

x_1 <- model.matrix(~ ., data = BP_full)[, -1]

x_1 <- x_1[, -c(1, 3)]

cormatrix <- cor(x_1)

corrplot(cormatrix, method = "color",
         col = alpha(cor_col, 0.40),  
         type = "upper", order = "hclust",
         addCoef.col = "black",
         tl.col = "black", tl.srt = 45,
         diag = TRUE)

# Initial Data Analysis  -------------------------------------
plot(BP_full[, c(3, 4, 5, 7)], pch = 20, 
     cex = 1.5, col = alpha("#d46c5b", 0.20))

We will not be looking at the estimates from the fitted model yet. We are mainly looking at a few characteristics of this dataset before we move on to the next step.

fit_1 <- rlm(PostDBP ~ Group + BaselineDBP +
             Age + Z + R1 + R2, data = BP_full)
ztable(performance(fit_1))

AIC	BIC	RMSE	Sigma
1420.38	1454.1	0.99	0.99

par(mfrow = c(2, 2))
plot(fit_1, pch = 20, cex = 1.5, 
     col = alpha("#d46c5b", 0.2))

Now that we have familiarized ourselves with this dataset, we notice that there are missing data (which I actually generated below) and suppose we didn’t actually know how the missing data were generated, we might suspect or start with the assumption that the data are missing at random (MAR). Of course, here, we do have an idea of what the missing data mechanism is, but we shall assume the role of a data analyst who has just been given a dataset with missing values in the outcome from their clinical colleague.

# Generate Missing Data  -------------------------------------
set.seed(1031)
Ry <- ifelse(0.33 + 2 * R1 - 4 * Group + 
               R2 + errors[, 1] > 0, 1, 0)
PostDBP[Ry == 0] <- NA
BP_miss <- data.frame(PostDBP, BaselineDBP,
                      Group, Age, Z, R1, R2)
str(BP_miss)
#> 'data.frame':    500 obs. of  7 variables:
#>  $ PostDBP    : num  134 130 NA 137 134 ...
#>  $ BaselineDBP: num  121 116 117 118 117 ...
#>  $ Group      : int  0 0 1 0 0 1 0 1 0 0 ...
#>  $ Age        : num  42.3 40.9 48.6 53.2 52.3 ...
#>  $ Z          : num  0.0462 -0.0666 -0.0842 0.017 0.0036 ...
#>  $ R1         : num  1.234 -0.719 -0.802 0.874 0.881 ...
#>  $ R2         : num  1.846 0.801 -0.228 1.221 1.212 ...
sum(is.na(BP_miss))
#> [1] 291
missing_col <- c("#3f8f9b", "#d46c5b")
aggr(BP_miss, col = alpha(missing_col, 0.7),
     plot = TRUE, prop = F, numbers = T)

So now, we must examine this missing dataset and its characteristics and look for any systematic differences. We also examine the proportion of missingness and the influx and outflux patterns.

# Examine Flux Patterns  -------------------------------------

round(fico(BP_miss), 1)[2:6]
#> BaselineDBP       Group         Age           Z          R1 
#>         0.6         0.6         0.6         0.6         0.6
ztable(flux_temp)

	POBS	Influx	Outflux	AINB	AOUT	FICO
PostDBP	0.42	0.54	0	1	0.0	0.00
BaselineDBP	1.00	0.00	1	0	0.1	0.58
Group	1.00	0.00	1	0	0.1	0.58
Age	1.00	0.00	1	0	0.1	0.58
Z	1.00	0.00	1	0	0.1	0.58
R1	1.00	0.00	1	0	0.1	0.58
R2	1.00	0.00	1	0	0.1	0.58

fluxplot(BP_miss, labels = TRUE, pch = 19,
         col = "#d46c5b", cex = .85,
         main = "Influx-Outflux Pattern")

Exploratory Analyses

We start off doing some preliminary analyses to eyeball differences between the observed data and the missing amputed data.

#> [1] 291

	pobs	influx	outflux	ainb	aout	fico
PostDBP	0.42	0.54	0	1	0.0	0.00
BaselineDBP	1.00	0.00	1	0	0.1	0.58
Group	1.00	0.00	1	0	0.1	0.58
Age	1.00	0.00	1	0	0.1	0.58
Z	1.00	0.00	1	0	0.1	0.58
R1	1.00	0.00	1	0	0.1	0.58
R2	1.00	0.00	1	0	0.1	0.58

We fit a model quickly to these missing data using the rlm() function from the MASS package and glance at what our estimates look like.

# Preliminary Model Estimates  -------------------------------------

eda_1 <- rlm(PostDBP ~ Group + BaselineDBP +
             Age + Z + R1 + R2, data = BP_miss)

ztable(summary(eda_1)[["coefficients"]])

	Value	Std. Error	t value
(Intercept)	1.41	5.88	0.24
Group	10.47	0.22	47.46
BaselineDBP	0.99	0.05	19.77
Age	0.33	0.01	37.95
Z	-2.79	1.50	-1.85
R1	-0.32	0.09	-3.73
R2	-0.18	0.14	-1.26

ztable(performance(eda_1))

AIC	BIC	RMSE	Sigma
575.85	602.59	0.92	0.94

Because the MASS::rlm() function will automatically delete missing observations and do a complete-case analysis, we can gauge what our estimates would look like if we assume that the data are missing completely at random (MCAR), which is almost always unrealistic.

It is almost always more realistic to assume that the data are missing at random (MAR), \(\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, Y_{\mathrm{mis}}, \psi\right)=\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, \psi\right)\). This can often be a reasonable assumption to start off with for the primary analysis.¹⁵

The Primary Analysis

For our main analysis, we will build an imputation model to impute our missing data. We include all possible information in our imputation model, meaning all the variables in our dataset, and impute the missing data using the predictive mean matching method, and then analyze each dataset and pool the results. Our workflow will look like this

Adapted flowchart from van Buuren (2018)15

We do this with a fully-adjusted model, with all relevant covariates and a reduced model with just the treatment group. This is because we want both the adjusted and unadjusted estimates. We will then compare these models using the likelihood-ratio test¹⁶ to see whether there are any substantial differences between the fully-adjusted and reduced model.

However, before we are able to move onto this step, we must determine how many imputations we will construct. To many, this may seem like a trivial and even nonsensical task when any individual can impute a dataset with the click of a button and enter in any number of datasets they wish to impute.

The issue with this is that without giving careful thought to how many imputations may be needed for a particular study, one may run the risk of imputing too few datasets and losing information, or they may waste resources imputing a nonsensical amount that may not be necessary. Indeed, the latter seems especially trivial, however, it may often yield no advantages whatsoever, and only incur costs as many authors have argued.

The argument is that “the additional resources that would be required to create and store more than a few imputations would not be well spent” (Schafer 1997, 107), and “in most situations there is simply little advantage to producing and analyzing more than a few imputed datasets” (Schafer and Olsen 1998, 549).

Therefore, we will examine certain characteristics of our dataset with missing values and use them in an analysis to determine how many datasets we should impute. We will base this off a number of characteristics such as the fraction of missing information, the proportion of missing observations, and the losses that we are willing to incur.

We first start by running a “dry” imputation of our data to quickly examine some of these characteristics. This imputed dataset will not be used for our primary, analysis, it is simply serving to guide us in our main imputation model. This is also why we have set the number of iterations to 0.

# Dry Imputation -------------------------------------
form <- list(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2)
pred <- make.predictorMatrix(BP_miss)

init_imp <- mice(BP_miss, method = "midastouch", m = 20, 
                 maxit = 0, cores = 4, donors = 5,
                 seed = 1031, formulas = form, ridge = 0, 
                 predictorMatrix = pred, remove.collinear = FALSE, 
                 remove.constant = FALSE, allow.na = FALSE,
                 printFlag = FALSE, visitSequence =  "monotone")

head(init_imp$loggedEvents, 10) 
#> NULL

Now that we have imputed our dataset, we may examine it for any issues before moving onto the next step of analyzing it.

str(complete(init_imp, 1))
#> 'data.frame':    500 obs. of  7 variables:
#>  $ PostDBP    : num  134 130 132 137 134 ...
#>  $ BaselineDBP: num  121 116 117 118 117 ...
#>  $ Group      : int  0 0 1 0 0 1 0 1 0 0 ...
#>  $ Age        : num  42.3 40.9 48.6 53.2 52.3 ...
#>  $ Z          : num  0.0462 -0.0666 -0.0842 0.017 0.0036 ...
#>  $ R1         : num  1.234 -0.719 -0.802 0.874 0.881 ...
#>  $ R2         : num  1.846 0.801 -0.228 1.221 1.212 ...
densityplot(init_imp, plot.points = TRUE,
            theme = aesthetics, col = col)

bwplot(init_imp, PostDBP, col = col)

Although it is only a dry run, there seems to be few issues with it.

init_res <- pool(with(init_imp,
                      rlm(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2)))

init_res_sum <- summary(init_res)

colnames(init_res_sum) <- c("Term", "Estimate", "SE",
                            "Statistic", "df", "P-val")

ztable(init_res_sum)

Term	Estimate	SE	Statistic	df	P-val
(Intercept)	74.04	18.87	3.92	101.82	0.00
Group	1.49	0.45	3.34	103.82	0.00
BaselineDBP	0.44	0.16	2.75	105.33	0.01
Age	0.17	0.03	5.43	122.20	0.00
Z	-2.76	4.80	-0.57	132.88	0.57
R1	0.25	0.21	1.23	184.33	0.22
R2	0.31	0.46	0.67	95.24	0.50

Now that we have our vector, we can quickly examine certain characteristics such as the fraction of missing information (FIMO), the between-imputation variance and the within-imputation variance. The reason that we focus on these particular numbers is because they are crucial for several tasks in any statistical analysis plan, such as having enough power/precision, proper long-coverage, etc.

We use the FIMO from the dry imputation and a number of other criteria to determine how many imputations we need:

Some have suggested taking the FIMO and multiplying by 100 to obtain the number of imputed datasets needed

\[m \approx 100 \lambda\]

we wish to ensure that the monte-carlo standard error from the imputations are less than 10% of the between-imputation standard error

\[mcse < B^{\hat{-}}_{SE} * 0.10\]

we wish to minimize the monte-carlo error that is derived from dividing the chosen number of datasets to impute from the fraction of missing information so that the monte-carlo standard error is less than 0.01

\[FMI / m ≈ 0.01\]

we also wish to minimize the total variance so that the square-root of it is no greater than the ideal variance and the corresponding confidence interval width.

\[T_{m}=\left(1+\frac{\gamma_{0}}{m}\right) T_{\infty}\]

Furthermore, it is commonly advocated by missing data experts to impute the same or similar number of datasets as fractions of missing observations \(\gamma_{0} * 100\)

We start by extracting the FMI’s from our dry imputation and calculating all these numbers.

init_res$pooled$fmi
#> [1] 0.3680287 0.3633896 0.3599639 0.3256135 0.3069595 0.2380724 0.3841991
fmi <- max(summary(init_res, type = c("all"), conf.int = TRUE,
            conf.level = 0.95)["fmi"])  # minimum number of imputation

m <- fmi / 0.01  # fmi imputations

mcse <- sqrt(max(init_res$pooled$b) / m) # mcse
se <- max((init_res$pooled$ubar) / sqrt(500)) # se

What we find is that the fraction of missing information is nearly 50% which is very high and that in order to get a monte-carlo error of less than 0.01, we would need a minimum of at least 50 imputed datasets. We also calculate both the monte-carlo standard error and the standard error from the dry run and find that they are practically equivalent.

However, we must also explore other possibilities in terms of number of imputed datasets to reduce the monte-carlo errors so we run a quick function to do, checking the effects of a number of imputed datasets on the width of the confidence interval, which is directly tied to the monte-carlo error.

Indeed, Von Hippel (2018) proposed a relationship between the fraction of missing information and the number of imputations needed to achieve a targeted CI length or monte-carlo error using a quadratic formula,

\[m=1+\frac{1}{2}\left(\frac{\gamma_{0}}{\operatorname{SD}\left(\sqrt{U_{\ell}}\right) \mathrm{E}\left(\sqrt{U_{\ell}}\right)}\right)^{2}\]

where \(\mathrm{E}\left(\sqrt{U_{\ell}}\right)\) and \(\mathrm{SD}\left(\sqrt{U_{\ell}}\right)\) are the coefficient of variation (CV), summarizing the imputation variation in the SE estimates. We can construct a function to depict this relationship and calculate how many imputations we will need based on some desired error and further display the distribution of this relationship by varying these parameters.

In the function below, adopted from von Hippel, and van Buuren,¹⁵ I vary both the coefficient of variation and the \(\alpha\)-level.

#' @title How Many Imputations?
#' @description Implements two-stage "how_many_imputations" from von Hippel (2018)
#' @param model Either a `mira` object (created by running a model on a data set which was imputed via "mice")
#' or a `mipo` object (creating by runing `pool()` on a `mira` object).
#' @param cv Desired precision of standard errors. Default to .05. (I.e., if the data were re-imputed, the
#' estimated standard errors would differ by no more than this amount.)
#' @param alpha Significance level for choice of "conservative" FMI.
#' @return The number of required imputations to obtain the `cv` level of precision.
#' @export
#' @importFrom methods is
#' @importFrom stats plogis qlogis qnorm
#' @importFrom mice pool
#' @references von Hippel, Paul T. (2018)
#' \sQuote{How Many Imputations Do You Need? A Two-stage Calculation Using a Quadratic Rule.},
#' \emph{Sociological Methods & Research} p.0049124117747303.
powerimp <- function(model, cv = .05, alpha = .05) {
  if (is(model, 'mira')) {
    model <- mice::pool(model)
  }
  if (!is(model, "mipo")) {
    stop("Model must be multiply imputed.")
  }
  fmi <- max(model$pooled$fmi)
  z <- qnorm(1 - alpha/2)
  fmiu <- plogis(qlogis(fmi) + z*sqrt(2/model$m))
  ceiling(1 + 1/2*(fmiu/cv)^2)
}
ci_length <- sqrt(1 + (fmi / m)) * 1 # length of CI
  ci_width <- function() {
  cv_i <- seq(0.01, 0.20, 0.01)
  results1 <- lapply((seq_along(cv_i)/100),
                       function(i) powerimp(model = init_res,
                                            cv = i, alpha = 0.01))
  results2 <- lapply((seq_along(cv_i)/100),
                       function(i) powerimp(model = init_res,
                                            cv = i, alpha = 0.05))
  results3 <- lapply((seq_along(cv_i)/100),
                       function(i) powerimp(model = init_res,
                                            cv = i, alpha = 0.1))
  a_0.01 <- do.call(rbind, results1)
  df <- data.frame(cv_i, a_0.01)
  df$a_0.05 <- do.call(rbind, results2)
  df$a_0.1 <- do.call(rbind, results3)
  return(df)
}
some <- ci_width()

With this function, we can now graphically examine the effects of different numbers of imputations on monte-carlo errors

We can see above that as we aim for smaller monte-carlo errors, we many more imputations to achieve our desired target.

Now that I have constructed and graphed three differing relationships between monte carlo errors and number of imputations needed to achieve those errors, and varied the \(\alpha\)-levels, we can now move onto choosing a specific number of imputations to construct.

If I choose an \(\alpha\) level of 5%, which the maximum tolerable type-I error rate, and a coefficient of variation of 2.5%

powerimp(model = init_res, cv = 0.05, alpha = 0.1)
#> [1] 54

I need to construct approximately 50 imputations to achieve this coefficient of variation, which is what I will use for all the analyses from here on.

For the primary analysis, I will use the predictive-mean matching method and specify 5 donors (based on a number of simulation studies that have examined the effect of various donors specified)¹⁷, and fit a robust regression that includes all the variables within the dataset. Our imputation model will be

\[Y_{Post} = \beta_{0} + \beta_{1}X_{Grp} + \beta_{2}X_{Base} + \beta_{3}X_{Age} + \beta_{4}X_{Z} + \beta_{5}X_{R1} + \beta_{6}X_{R2} + \epsilon\]

# Impute Missing Data  -------------------------------------
form <- list(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2)
pred <- make.predictorMatrix(BP_miss)

imp1 <- mice(BP_miss, method = "midastouch", predictorMatrix = pred,
             m = 50, maxit = 5, seed = 1031, donors = 5,
             cores = 4, formulas = form, ridge = 0, 
             remove.collinear = FALSE, remove.constant = FALSE, 
             allow.na = FALSE, printFlag = FALSE, 
             visitSequence = "monotone")

head(imp1$loggedEvents, 10) 
#> NULL

Now that we have imputed our dataset, we may examine it for any issues before moving onto the next step of analyzing it.

str(complete(imp1, 10))
#> 'data.frame':    500 obs. of  7 variables:
#>  $ PostDBP    : num  134 130 146 137 134 ...
#>  $ BaselineDBP: num  121 116 117 118 117 ...
#>  $ Group      : int  0 0 1 0 0 1 0 1 0 0 ...
#>  $ Age        : num  42.3 40.9 48.6 53.2 52.3 ...
#>  $ Z          : num  0.0462 -0.0666 -0.0842 0.017 0.0036 ...
#>  $ R1         : num  1.234 -0.719 -0.802 0.874 0.881 ...
#>  $ R2         : num  1.846 0.801 -0.228 1.221 1.212 ...
plot(imp1, layout = c(1, 2), col = col, theme = aesthetics)

densityplot(imp1, plot.points = TRUE, col = col, theme = aesthetics)

bwplot(imp1, PostDBP, col = col)

stripplot(imp1, pch = c(21, 20), cex = c(1, 1.5),  col = col)[-3]

Our imputed datasets seem to look fine so far. We now fit our models to each of these datasets and combine them using Rubin’s rules.

# Primary Analysis  -------------------------------------

analysis1 <- with(imp1, rlm(PostDBP ~ Group +
                          BaselineDBP + Age + Z + R1 + R2))
analysis2 <- with(imp1, rlm(PostDBP ~ Group))

anova(analysis1, analysis2, method = "D3", use = "LR")
#>    test statistic df1      df2 dfcom p.value      riv
#>  1 ~~ 2         0   5 247.7893   493       1 171.8849
result1 <- pool(analysis1)

results1_sum <- summary(result1)

colnames(results1_sum) <- c("Term", "Estimate", "SE",
                            "Statistic", "df", "P-val")

ztable(results1_sum)

Term	Estimate	SE	Statistic	df	P-val
(Intercept)	6.05	6.53	0.93	60.99	0.36
Group	10.57	0.26	41.35	29.53	0.00
BaselineDBP	0.96	0.05	17.53	62.94	0.00
Age	0.33	0.01	30.74	65.40	0.00
Z	-2.72	1.50	-1.81	83.50	0.07
R1	-0.36	0.09	-4.06	48.55	0.00
R2	-0.19	0.13	-1.44	89.21	0.15

While we have our primary analysis estimates, we continue to examine our imputations.

ps <- rep(rowMeans(sapply(analysis1$analyses, fitted.values)),
          imp1$m + 1)
xyplot(imp1, PostDBP ~ ps | as.factor(.imp),
       xlab = "Probability that record is incomplete",
       ylab = "PostDBP", pch = c(1, 19), col = col)[1:12]

We’ve done our primary analysis, in which we imputed the missing data under the assumption of MAR. Unfortunately, we cannot verify whether or not this assumption is true, but we can vary this missing data assumption and assume that the data are missing not at random (MNAR) \(\operatorname{Pr}\left(R=0 \mid Y_{\mathrm{obs}}, Y_{\mathrm{mis}}, \psi\right)\), and that in the individuals with missing data, we can assume different numbers than those with complete data. Our likelihood-ratio test also suggests that there’s very little difference between the full model and the reduced model, so we generally will go with the full model.

A \(\delta\)-Adjusted Sensitivity Analysis

Now suppose we wish to handle these missing under the assumption of MNAR, we would now be conducting a sensitivity analysis because we are still targeting the same estimand, but only varying the assumptions. The question / target remain the same. To handle missing data that we assume are MNAR, there are a number of different and complex approaches.

We start off with the \(\delta\)-adjustment, controlled multiple imputation method,¹⁸ in which we assume that the group with missing observations differs systematically from the group with complete observations by a certain quantity. We produce a range of these quantities and add or subtract them from the imputed values and then conduct our analysis on this adjusted, imputed dataset. Here I will make the assumption that the average PostDBP estimate from the missing value group differs from at least 0 mmHg to at most 30 mmHg from the group with no missing observations.

# Controlled Multiple Imputation  -------------------------------------
ini <- mice(BP_miss, method = "midastouch", maxit = 0,
            predictorMatrix = pred)

ini$method["BaselineDBP"] <- ""
ini$method["PostDBP"] <- "midastouch"
ini$method["Age"] <- ""
ini$method["R1"] <- ""
ini$method["Group"] <- ""
ini$method["R2"] <- ""
ini$method["Z"] <- ""
meth <- ini$method

delta <- c(-30, -25, -20, -15, -10,
           0, 5, 10, 15, 20, 25, 30)

imp.all <- vector("list", length(delta))
post <- ini$post
for (i in 1:length(delta)) {
  d <- delta[i]
  cmd <- paste("imp[[j]][, i] <- imp[[j]][, i] +", d)
  post["PostDBP"] <- cmd
  imp <- mice(BP_miss, method = meth,
              predictorMatrix = pred,
              m = 50, maxit = 5, donors = 5,
              cores = 4, ridge = 0, formulas = form,
              remove.collinear = FALSE,
              remove.constant = FALSE,
              allow.na = FALSE, post = post,
              seed = i * 1031, printFlag = FALSE)
  imp.all[[i]] <- imp
}

Now that we have our imputed datasets that are modified by \(\delta\)-adjustments, we can fit our models to each imputed dataset and combine the estimates using Rubin’s rules.

# Sensitivity Analysis 1  -------------------------------------

output <- sapply(imp.all,
                 function(x) pool(with(x, rlm(PostDBP ~ Group +
                                              BaselineDBP + Age + Z + R1 + R2))),
                 simplify = "array", USE.NAMES = TRUE)
a <- (as.data.frame(t(output)))$pooled
r <- array(rep(NA, length(delta) * 7),
           dim = c(length(delta), 7), dimnames = list(c(delta),
                                               c("Intercept", "Group",
                                                 "BaselineDBP", "Age",
                                                 "Z", "R1", "R2")))
for (i in 1:length(delta)) {
   r[i, ] <- cbind(a[[i]][["estimate"]])
}
r <- as.data.frame(r)
r <- data.frame(Delta = row.names(r), r, row.names = NULL)
ztable(r)

Delta	Intercept	Group	BaselineDBP	Age	Z	R1	R2
-30	35.20	-12.48	0.65	0.34	-5.17	4.17	1.87
-25	30.89	-8.37	0.69	0.34	-5.09	3.61	1.62
-20	26.92	-4.25	0.73	0.34	-4.69	3.07	1.39
-15	24.73	-0.16	0.76	0.33	-4.24	2.49	1.17
-10	17.88	3.53	0.83	0.33	-3.85	1.60	0.72
0	5.63	10.57	0.96	0.33	-2.76	-0.38	-0.19
5	0.65	13.78	1.02	0.32	-2.81	-1.43	-0.67
10	-4.14	17.29	1.08	0.31	-2.70	-2.38	-1.04
15	-7.42	21.07	1.12	0.31	-1.97	-3.25	-1.38
20	-10.83	24.78	1.16	0.30	-1.58	-4.03	-1.66
25	-10.30	28.51	1.17	0.29	-1.06	-4.82	-1.97
30	-15.18	32.73	1.22	0.29	-1.00	-5.37	-2.15

The table above gives the various \(\delta\)’s (the first column with values ranging from -30 - 30 in increments of 5) that were added to the multiply imputed datasets, and the corresponding estimates from the models fit to those datasets. So now we’ve conducted our first sensitivity analysis using the \(\delta\)-adjustment, controlled multiple imputation method. This form of sensitivity analysis is also typically what is used and preferred by regulators.¹⁸.

Regulators prefer simple methods that impute the missing outcomes under MAR, and then add an adjustment δ to the imputes, while varying δ over a plausible range and independently for each treatment group (Permutt 2016). The most interesting scenarios will be those where the difference between the δ’s correspond to the size of the treatment effect in the completers. Contours of the p-values may be plotted on a graph as a function of the δ’s to assist in a tipping-point analysis (Liublinska and Rubin 2014).

And as expected, lower \(\delta\) values gave us results that were consistent with the primary analysis, while the upper end of the extreme \(\delta\) values gave us estimates that were substantially off the mark.

Next, we quickly examine the state of a random sample of these imputations to make sure they’re reliable.

# Examine Imputations  -------------------------------------

l <- sample(1:length(delta))[1]
densityplot(imp.all[[l]], plot.points = TRUE,
            theme = aesthetics, col = col)

bwplot(imp.all[[l]], PostDBP,  col = col)

Now we move onto our next and final sensitivity analysis.

A Selection Sensitivity Analysis

To further explore the MNAR assumption, we could also use a Heckman selection model \(P(Y,R)=P(Y)P(R|Y)\), as described by van Buuren here:¹⁵

The selection model multiplies the marginal distribution \(P(Y)\) in the population with the response weights \(P(R|Y)\). Both \(P(Y)\) and \(P(R|Y)\) are unknown, and must be specified by the user. The model where \(P(Y)\) is normal and where \(P(R|Y)\) is a probit model is known as the Heckman model. This model is widely used in economics to correct for selection bias.

We can implement this in R using the miceMNAR package, which is an extension to the mice R package.

# Selection Models  -------------------------------------

JointModelEq <- generate_JointModelEq(data = BP_miss,
                                      varMNAR = "PostDBP")
JointModelEq[,"PostDBP_var_sel"] <- c(0, 1, 1, 1, 1, 1, 1)
JointModelEq[,"PostDBP_var_out"] <- c(0, 1, 1, 1, 0, 1, 1)

arg <- MNARargument(data = BP_miss, varMNAR = "PostDBP",
                    JointModelEq = JointModelEq)

imputation2 <- mice(data = arg$data_mod,
                    method = arg$method, seed = 1031,
                    predictorMatrix = arg$predictorMatrix,
                    JointModelEq = arg$JointModelEq,
                    control = arg$control, formulas = form,
                    maxit = 5, m = 50, cores = 4,
                    ridge = 0, remove.collinear = FALSE,
                    remove.constant = FALSE, allow.na = FALSE,
                    printFlag = FALSE, visitSequence =  "monotone")

head(imputation2$loggedEvents, 10)
#> NULL
imputation2$data$Group <- as.factor(imputation2$data$Group)

densityplot(imputation2, col = col,
            plot.points = TRUE,
            theme = aesthetics)

plot(imputation2, layout = c(1, 2),  col = col, theme = aesthetics)

bwplot(imputation2, PostDBP, col = col)

Now we fit our models to these new datasets and pool them using Rubin’s rules.

# Sensitivity Analysis 2  -------------------------------------

analysis3 <- with(imputation2, rlm(PostDBP ~ Group +
                            BaselineDBP + Age + Z + R1 + R2))
result3 <- pool(analysis3)

results3_sum <- summary(result3)
colnames(results3_sum) <- c("Term", "Estimate", "SE",
                            "Statistic", "df", "P-val")

ztable(results3_sum)

Term	Estimate	SE	Statistic	df	P-val
(Intercept)	8.83	5.62	1.57	126.57	0.12
Group1	9.69	0.20	49.24	51.55	0.00
BaselineDBP	0.93	0.05	19.44	127.79	0.00
Age	0.33	0.01	35.50	130.31	0.00
Z	-1.01	1.32	-0.77	215.40	0.44
R1	-0.05	0.09	-0.62	64.35	0.54
R2	-0.03	0.15	-0.23	101.44	0.82

Now we examine our imputed values.

ps2 <- rep(rowMeans(sapply(analysis3$analyses, fitted.values)),
          imputation2$m + 1)
xyplot(imputation2, PostDBP ~ ps2 | as.factor(.imp),
       xlab = "Probability that record is incomplete",
       ylab = "PostDBP", pch = c(1, 19), col = col)[1:12]

so now we have conducted our second sensitivity analysis.

The results from both the primary analysis, the first sensitivity analysis, and the second seem somewhat consistent. However, with the first sensitivity analysis where we applied \(\delta\)-adjustments, as the adjustments became larger, the coefficients changed drastically. However, such large \(\delta\)-adjustments are not realistic and those that were smaller led to coefficients that were closer to the other two analyses.

Supplementary Analyses

The following analyses are what I mostly consider to be supplementary analyses that may further investigate some violations of assumptions but often go far beyond what would be necessary for a principled sensitivity analysis that would typically accompany a primary analysis.

First, I like to check that the results from my analyses are consistent across different statistical packages. So I may check to see that the results from my primary analysis from R are also similar to the results from another statistical package like Stata.

So in Stata, I would likely run something similar to the following in order to mimic the primary analysis in R.

import delimited "https://lesslikely.com/datasets/temp.csv", numericcols(2 3 5 6 7 8)
#> . import delimited "https://lesslikely.com/datasets/temp.csv", numericcols(2 >  6 7 8)
#> (encoding automatically selected: ISO-8859-2)
#> (8 vars, 500 obs)
#> 
#> . 
#> . mi set mlong
#> 
#> . 
#> . mi register imputed baselinedbp postdbp age z group r1 r2
#> (291 m=0 obs now marked as incomplete)
#> 
#> . 
#> . mi impute chained (pmm, knn(5)) postdbp = baselinedbp age z r1 r2 group, burni
#> > n(10) add(40) rseed (1031) savetrace(trace1, replace) nomonotone
#> note: missing-value pattern is monotone.
#> 
#> Conditional models:
#>            postdbp: pmm postdbp baselinedbp age z r1 r2 group , knn(5)
#> 
#> Performing chained iterations ...
#> 
#> Multivariate imputation                     Imputations =       40
#> Chained equations                                 added =       40
#> Imputed: m=1 through m=40                       updated =        0
#> 
#> Initialization: monotone                     Iterations =      400
#>                                                 burn-in =       10
#> 
#>            postdbp: predictive mean matching
#> 
#> ------------------------------------------------------------------
#>                    |               Observations per m             
#>                    |----------------------------------------------
#>           Variable |   Complete   Incomplete   Imputed |     Total
#> -------------------+-----------------------------------+----------
#>            postdbp |        209          291       291 |       500
#> ------------------------------------------------------------------
#> (Complete + Incomplete = Total; Imputed is the minimum across m
#>  of the number of filled-in observations.)
#> 
#> . 
#> . mi estimate: rreg postdbp i.group baselinedbp age z r1 r2
#> 
#> Multiple-imputation estimates                   Imputations       =         40
#> Robust regression                               Number of obs     =        500
#>                                                 Average RVI       =     1.7108
#>                                                 Largest FMI       =     0.6989
#>                                                 Complete DF       =        493
#> DF adjustment:   Small sample                   DF:     min       =      53.41
#>                                                         avg       =      79.97
#>                                                         max       =     100.58
#> Model F test:       Equal FMI                   F(   6,  275.2)   =    1159.37
#>                                                 Prob > F          =     0.0000
#> 
#> ------------------------------------------------------------------------------
#>      postdbp | Coefficient  Std. err.      t    P>|t|     [95% conf. interval]
#> -------------+----------------------------------------------------------------
#>      1.group |   10.43052   .1577375    66.13   0.000     10.11485    10.74618
#>  baselinedbp |   .9633652   .0469246    20.53   0.000     .8701846    1.056546
#>          age |    .323474   .0090605    35.70   0.000     .3054994    .3414486
#>            z |   -2.63417   1.541137    -1.71   0.091    -5.700706    .4323655
#>           r1 |  -.3419754   .0827265    -4.13   0.000     -.507874   -.1760768
#>           r2 |  -.1667765    .140954    -1.18   0.240    -.4472669    .1137139
#>        _cons |   5.392655   5.528357     0.98   0.332    -5.585819    16.37113
#> ------------------------------------------------------------------------------

As you can see from above, I imported the missing data I generated in R into Stata, and then used it to multiply impute the datasets using the predictive mean matching method with 5 donors. I chose the exact same number of variables to include in the imputation model as I did with the primary and sensitivity analyses. Just like the primary analysis, I also used robust regression, rreg in Stata, to analyze the datasets. The numbers seem to be fairly consistent with those from the primary analysis and the some of the results from the sensitivity analyses.

I will now conduct another supplementary analysis, in which I fit a Bayesian regression model using a t-distribution so that it is analogous to the other robust regression models I have utilized so far, and I will specify a weakly informative prior.

# Bayesian Regression  -------------------------------------

brm_form <- bf(PostDBP ~ Group + BaselineDBP + 
               Age + Z + R1 + R2)

preds <-  c("Intercept", "Group", "BaselineDBP", 
            "Age", "Z", "R1", "R2")

prior1 <- c(prior(normal(0, 100), class = Intercept),
            prior(normal(0, 100), class = b),
            prior(normal(0, 100), class = sigma))

b_1 <- brm_multiple(brm_form, data = imp1, prior = prior1,
                    silent = 2, family = student(), iter = 4000, refresh = 0,
                    warmup = 1000, chains = 1, inits = "random",
                    cores = 4, thin = 2, open_progress = FALSE,
                    combine = TRUE, sample_prior = "no", 
                    seed = 1031, algorithm = "sampling", backend = "rstan",
                    control = list(max_treedepth = 10, adapt_delta = 0.8),
                    file = NULL, file_refit = "never")

b_1[["fit"]]@sim[["fnames_oi"]] <- preds 
b_1
#>  Family: student 
#>   Links: mu = identity; sigma = identity; nu = identity 
#> Formula: PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2 
#>    Data: imp1 (Number of observations: 500) 
#> Samples: 50 chains, each with iter = 4000; warmup = 1000; thin = 2;
#>          total post-warmup samples = 75000
#> 
#> Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
#> and Tail_ESS are effective sample size measures, and Rhat is the potential
#> scale reduction factor on split chains (at convergence, Rhat = 1).
ztable(posterior_summary(b_1))

	Estimate	Est.Error	Q2.5	Q97.5
Intercept	5.40	6.15	-5.35	19.04
Group	10.58	0.25	10.02	11.05
BaselineDBP	0.96	0.05	0.85	1.05
Age	0.33	0.01	0.31	0.35
Z	-2.66	1.47	-5.47	0.34
R1	-0.36	0.09	-0.51	-0.15
R2	-0.20	0.13	-0.47	0.06

We examine our model diagnostics to look for any anomalies.

Nothing seems out of the ordinary.

Indeed, it is no surprise that the Bayesian model with a weakly dispersed prior would give similar results to the base MASS::rlm() function, while specifying a regularizing prior like the horseshoe prior with three degrees of freedom, would practically shrink all the coefficients towards zero, including our covariate Z. We now look at robust methods and resampling and the results they give.

Although this is currently under the supplementary analysis section, I would very much consider this to be a sensitivity analysis. The method utilized here is known as the random indicator method, and it is an experimental method that essentially combines a selection model and a pattern-mixture model and is designed to handle data that are missing not at random. As van Buuren writes¹⁵

The random indicator method (Jolani 2012)¹⁹ is an experimental iterative method that redraws the missing data indicator under a selection model, and imputes the missing data under a pattern-mixture model, with the objective of estimating δ from the data under relaxed assumptions. Initial simulation results look promising. The algorithm is available as the ri method in mice.

The script below utilizes this method (random indicator), but bootstraps the missing dataset before the imputation procedure, and then after multiply imputing the datasets, obtains estimates using maximum likelihood multiple (MLMI) imputation, an alternative and more efficient method to the more popular posterior draw multiple imputation (PDMI). More about MLMI can be found here.²⁰

# Bootstrap Inference -------------------------------------

imps <- bootMice(BP_miss, method = "ri", maxit = 5, 
                 predictorMatrix = pred, nBoot = 100, nImp = 2,
                 seed = 1031, formulas = form,
                 ridge = 0, remove.collinear = FALSE,
                 remove.constant = FALSE, allow.na = TRUE,
                 printFlag = FALSE, visitSequence = "monotone")

analyseImp <- function(inputData) {
  return(coef(rlm(PostDBP ~ Group + BaselineDBP + Age + Z + R1 + R2,
                    data = inputData)))
}

ests <- bootImputeAnalyse(imps, analyseImp, quiet = TRUE)

ests <- as.data.frame(ests)
colnames(ests) <- c("Estimate", "Var",
                            "CI.ll", "CI.ul", "df")
rownames(ests) <- c("Intercept", "Group", "BaselineDBP",
                    "Age", "Z", "R1", "R2")

ztable(ests)

	Estimate	Var	CI.ll	CI.ul	df
Intercept	-0.61	21.72	-9.97	8.76	49.39
Group	10.99	0.04	10.57	11.41	10.02
BaselineDBP	1.01	0.00	0.93	1.09	49.81
Age	0.33	0.00	0.32	0.35	44.81
Z	-2.90	1.78	-5.57	-0.23	59.88
R1	-0.53	0.01	-0.71	-0.34	19.99
R2	-0.31	0.02	-0.57	-0.06	35.25

Full Analysis Set

After we run all those analyses, we can finally check our numbers from the full dataset with no missing observations to see how our approaches to handle the missing data have performed. We first look at the fully adjusted model fit with MASS::rlm().

# Full Analysis Set  -------------------------------------

final_full_mod <- rlm(PostDBP ~ Group + BaselineDBP +
                      Age + Z + R1 + R2, data = BP_full)
par(mfrow = c(2, 2))
plot(final_full_mod, pch = 20, 
     cex = 1.5, col = alpha("#d46c5b", 0.20))

ztable(check_collinearity(final_full_mod))

Term	VIF	SE_factor
Group	1.01	1.00
BaselineDBP	1.98	1.41
Age	1.01	1.00
Z	1.97	1.41
R1	1.00	1.00
R2	1.01	1.01

ztable(summary(final_full_mod)[["coefficients"]])

	Value	Std. Error	t value
(Intercept)	0.51	3.84	0.13
Group	9.77	0.09	106.87
BaselineDBP	1.00	0.03	30.63
Age	0.33	0.01	51.14
Z	-1.95	1.02	-1.92
R1	-0.08	0.05	-1.67
R2	-0.04	0.09	-0.44

Then we look at the unadjusted model, where the difference is the group that the participants were randomized to.

# Unadjusted Model  -------------------------------------

final_reduced_mod <- rlm(PostDBP ~ Group, data = BP_full)

ztable(summary(final_reduced_mod)[["coefficients"]])

	Value	Std. Error	t value
(Intercept)	133.58	0.20	670.32
Group	9.72	0.28	34.75

We can also compare the two to see if there are any substantial differences.

# Compare Models -------------------------------------

ztable(compare_performance(final_full_mod, final_reduced_mod))

Name	Model	AIC	BIC	RMSE	Sigma
final_full_mod	rlm	1420.38	1454.10	0.99	0.99
final_reduced_mod	rlm	2586.38	2599.03	3.19	3.20

While the results have mostly been consistent, I also wanted to run a few additional analyses of my own to see whether these effect estimates were reliable. I first started by constructing the confidence distribution to see the range of parameter values consistent with the data given the background model assumptions.^{21, 22}

library("concurve")

fit1 <- lm(PostDBP ~ Group + BaselineDBP +
                      Age + Z + R1 + R2, 
           data = BP_full)

curve1 <- curve_gen(fit1, var = "Group", method = "rlm", 
                    log = FALSE, penalty = NULL,
                    m = NULL, steps = 3000, 
                    table = TRUE)

ggcurve(curve1[[1]], type = "c", 
        levels = c(0.5, 0.75, 0.95, 0.99)) +
  theme_less() + 
  labs(title = "Confidence Curve for 'Group' Parameter",
       subtitle = "Displays interval estimates at every level",
       x = "Parameter Value (Treatment Effect)")

Lower Limit	Upper Limit	CI Width	Level (%)	CDF	P-value	S-value
9.744	9.800	0.057	25.0	0.625	0.750	0.415
9.712	9.832	0.120	50.0	0.750	0.500	1.000
9.669	9.875	0.205	75.0	0.875	0.250	2.000
9.658	9.886	0.229	80.0	0.900	0.200	2.322
9.643	9.901	0.257	85.0	0.925	0.150	2.737
9.625	9.919	0.294	90.0	0.950	0.100	3.322
9.597	9.947	0.350	95.0	0.975	0.050	4.322
9.572	9.972	0.401	97.5	0.988	0.025	5.322
9.541	10.003	0.461	99.0	0.995	0.010	6.644
Table of Statistics for Various Interval Estimate Percentiles

We can see that that the overall treatment effect is approximately 9.7-9.9, and for most of the principled methods we used to handle the missing data, we obtained similar estimates. However, we also found that the complete-case analysis using the MASS::rlm() function, gave similar estimates for the group treatment effect, as the more principled methods of handling missing data. Furthermore, although our initial dry imputation which we used to set up our main imputation later on was not meant to yield estimates that were to be taken seriously, it gave the most deviant treatment effect. Possibly suggesting that low numbers of imputations are unreliable due to the high monte-carlo error and that simply plugging data into a multiple imputation package without careful thought about the imputation model is often not a good idea.

I hope it is clear from the examples above that it is quite easy to do additional analyses to explore whether or not the results you’ve obtained are trustworthy or at least even give that impression, but doing a careful and principled one is quite difficult and requires much thought that will take into account real departures from assumptions and whether or not those will lead to substantial changes in results and conclusions/decisions.

Computing Environment

#> R version 4.1.0 (2021-05-18)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur 10.16
#> 
#> Random number generation:
#>  RNG:     L'Ecuyer-CMRG 
#>  Normal:  Inversion 
#>  Sample:  Rejection 
#>  
#> Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
#> 
#> Package version:
#>   bayesplot_1.8.1     boot_1.3-28         bootImpute_1.2.0    brms_2.15.0         car_3.0-10          corrplot_0.89      
#>   cowplot_1.1.1       future_1.21.0       ggplot2_3.3.4       ggstatsplot_0.8.0   here_1.0.1.9000     Hmisc_4.5-0        
#>   ImputeRobust_1.3-1  kableExtra_1.3.4    knitr_1.33.6        lattice_0.20-44     magrittr_2.0.1      MASS_7.3-54        
#>   mice_3.13.0         miceMNAR_1.0.2      mvtnorm_1.1-2       parallel_4.1.0      performance_0.7.2   qqplotr_0.0.5      
#>   rms_6.2-0           rstan_2.21.2        showtext_0.9-2      Statamarkdown_0.6.1 summarytools_0.9.9  tidyr_1.1.3        
#>   VIM_6.1.0           wesanderson_0.3.6

References

1. Greenland S. (2005). ‘Multiple-bias modelling for analysis of observational data’. Journal of the Royal Statistical Society Series A (Statistics in Society). 168:267–306. doi: 10.1111/j.1467-985X.2004.00349.x.

2. Greenland S, Lash TL. (2008). ‘Bias analysis’. In: Rothman KJ, Greenland S, Lash TL, editors. Modern Epidemiology. 3rd edition. Lippincott Williams & Wilkins. p. 345–380.

3. Lash TL, Fox MP, Fink AK. (2009). ‘Applying Quantitative Bias Analysis to Epidemiologic Data’. Springer New York.

4. Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. (2014). ‘Good practices for quantitative bias analysis’. International Journal of Epidemiology. 43:1969–1985. doi: 10.1093/ije/dyu149.

5. Lash TL, Ahern TP, Collin LJ, Fox MP, MacLehose RF. (2021). ‘Bias Analysis Gone Bad’. American Journal of Epidemiology. doi: 10.1093/aje/kwab072.

6. Gustafson P. (2021). ‘Invited Commentary: Toward Better Bias Analysis’. American Journal of Epidemiology. doi: 10.1093/aje/kwab068.

7. Greenland S. (2021). ‘Dealing with the Inevitable Deficiencies of Bias Analysis and All Analyses’. American Journal of Epidemiology. doi: 10.1093/aje/kwab069.

8. Scott RaP. (2002). ‘The Multicentre Aneurysm Screening Study (MASS) into the effect of abdominal aortic aneurysm screening on mortality in men: A randomised controlled trial’. The Lancet. 360:1531–1539. doi: 10.1016/S0140-6736(02)11522-4.

9. Morris TP, Kahan BC, White IR. (2014). ‘Choosing sensitivity analyses for randomised trials: principles’. BMC Medical Research Methodology. 14:11. doi: 10.1186/1471-2288-14-11.

10. Mosteller F, Tukey JW. (1987). ‘Data analysis including statistics’. In: The Collected Works of John W. Tukey: Philosophy and Principles of Data Analysis 1965-1986. CRC Press.

11. Akacha M, Bretz F, Ohlssen D, Rosenkranz G, Schmidli H. (2017). ‘Estimands and Their Role in Clinical Trials’. Statistics in Biopharmaceutical Research. 9:268–271. doi: 10.1080/19466315.2017.1302358.

12. Permutt T. (2020). ‘Do Covariates Change the Estimand?’. Statistics in Biopharmaceutical Research. 12:45–53. doi: 10.1080/19466315.2019.1647874.

13. Mitroiu M, Oude Rengerink K, Teerenstra S, Pétavy F, Roes KCB. (2020). ‘A narrative review of estimands in drug development and regulatory evaluation: Old wine in new barrels?’. Trials. 21:671. doi: 10.1186/s13063-020-04546-1.

14. Little RJ, D’Agostino R, Cohen ML, Dickersin K, Emerson SS, Farrar JT, et al. (2012). ‘The Prevention and Treatment of Missing Data in Clinical Trials’. New England Journal of Medicine. 367:1355–1360. doi: 10.1056/NEJMsr1203730.

15. Buuren S van. (2018). ‘Flexible Imputation of Missing Data, Second Edition’. CRC Press.

16. Meng X-L, Rubin DB. (1992). ‘Performing Likelihood Ratio Tests with Multiply-Imputed Data Sets’. Biometrika. 79:103–111. doi: 10.2307/2337151.

17. Schenker N, Taylor J. (1996). ‘Partially parametric techniques for multiple imputation’. Computational Statistics & Data Analysis. 22:425–446. doi: 10.1016/0167-9473(95)00057-7.

18. Permutt T. (2016). ‘Sensitivity analysis for missing data in regulatory submissions’. Statistics in Medicine. 35:2876–2879. doi: 10.1002/sim.6753.

19. Jolani S. (2012). ‘Dual Imputation Strategies for Analyzing Incomplete Data’.

20. von Hippel PT, Bartlett J. (2019). ‘Maximum likelihood multiple imputation: Faster imputations and consistent standard errors without posterior draws’. arXiv:12100870 [stat]. https://arxiv.org/abs/1210.0870.

21. Rafi Z, Greenland S. (2020). ‘Semantic and cognitive tools to aid statistical science: Replace confidence and significance by compatibility and surprise’. BMC Medical Research Methodology. 20:244. doi: 10.1186/s12874-020-01105-9.

22. Greenland S, Rafi Z. (2020). ‘To Aid Scientific Inference, Emphasize Unconditional Descriptions of Statistics’. arXiv:190908583 [statME]. https://arxiv.org/abs/1909.08583.

To leave a comment for the author, please follow the link and comment on their blog: Statistical Science & Related Matters on Less Likely.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.