Well, maybe not, but this comes up all the time. An investigator wants to assess the effect of an intervention on a outcome. Study participants are randomized either to receive the intervention (could be a new drug, new protocol, behavioral intervention, whatever) or treatment as usual. For each participant, the outcome measure is recorded at baseline – this is the pre in pre/post analysis. The intervention is delivered (or not, in the case of the control group), some time passes, and the outcome is measured a second time. This is our post. The question is, how should we analyze this study to draw conclusions about the intervention’s effect on the outcome?
There are at least three possible ways to approach this. (1) Ignore the pre outcome measure and just compare the average post scores of the two groups. (2) Calculate a change score for each individual (\(\Delta_i = post_i – pre_i\)), and compare the average \(\Delta\)’s for each group. Or (3), use a more sophisticated regression model to estimate the intervention effect while controlling for the pre or baseline measure of the outcome. Here are three models associated with each approach (\(T_i\) is 1 if the individual \(i\) received the treatment, 0 if not, and \(\epsilon_i\) is an error term):
&(1) \ \ post_i = \beta_0 + \beta_1T_i + \epsilon_i \\
&(2) \ \ \Delta_i = \alpha_0 + \alpha_1T_i + \epsilon_i \\
&(3) \ \ post_i = \gamma_0 + \gamma_1 pre_i+ \gamma_2 T_i + \epsilon_i
I’ve explored various scenarios (i.e. different data generating assumptions) to see if it matters which approach we use. (Of course it does.)
When the effect differs by baseline measurement
In a slight variation of the previous scenario, the effect of the intervention itself is a now function of the baseline score. Those who score higher will benefit less from the intervention – they simply have less room to improve. In this case, the adjusted model appears slightly inferior to the change model, while the unadjusted post-only model is still relatively low powered.
defPO <- updateDef(defPO, changevar = "eff", newformula = "1.9 - 1.9 * pre0/15")
presults[, .(postonly = mean(p1 <= 0.05), change = mean(p2 <= 0.05), adjusted = mean(p3 <= 0.025 | p3x <= 0.025))]
## postonly change adjusted ## 1: 0.425 0.878 0.863
The adjusted model has less power than the change model, because I used a reduced \(\alpha\)-level for the hypothesis test of the adjusted models. I am testing for interaction first, then if that fails, for main effects, so I need to adjust for multiple comparisons. (I have another post that shows why this might be a good thing to do.) I have used a Bonferroni adjustment, which can be a more conservative test. I still prefer the adjusted model, because it provides more insight into the underlying process than the change model.
Treatment assignment depends on baseline measurement
Now, slightly off-topic. So far, we’ve been talking about situations where treatment assignment is randomized. What happens in a scenario where those with higher baseline scores are more likely to receive the intervention? Well, if we don’t adjust for the baseline score, we will have unmeasured confounding. A comparison of follow-up scores in the two groups will be biased towards the intervention group if the baseline scores are correlated with follow-up scores – as we see visually with a scenario in which the effect size is set to 0. Also notice that the p-values for the unadjusted model are consistently below 0.05 – we are almost always drawing the wrong conclusion if we use this model. On the other hand, the error rate for the adjusted model is close to 0.05, what we would expect.
defPO <- updateDef(defPO, changevar = "eff", newformula = 0) dt <- genData(1000, defPO) dt <- trtObserve(dt, "-4.5 + 0.5 * pre0", logit.link = TRUE) dt <- addColumns(defObs, dt)
## postonly change adjusted ## 1: 0.872 0.095 0.046
I haven’t proved anything here, but these simulations suggest that we should certainly think twice about using an unadjusted model if we happen to have baseline measurements. And it seems like you are likely to maximize power (and maybe minimize bias) if you compare follow-up scores while adjusting for baseline scores rather than analyzing change in scores by group.