Minimizing Bias in Observational Studies

November 26, 2012

(This article was first published on DiffusePrioR » R, and kindly contributed to R-bloggers)

Measuring the effect of a binary treatment on a measured outcome is one of the most common tasks in applied statistics. Examples of these applications abound, like the effect of smoking on health, or the effect of low birth weight on cognitive development. In an ideal world we would like to be able to assign one group of people to receive some form of treatment, and an identical group to not receive this treatment. In this world, the average treatment effect (ATE) is the difference in outcomes between the treatment and control groups.

However, the real world is far from ideal, and the problematic nature of measuring causal effects has (justifiably) spawned a wide literature. Many solutions have been proposed to this problem. The simplest one involves controlling for the pre-treatment characteristics defining both the treatment and control groups. For example, in the case of low birth weight, a researcher may want to adjust for variation in the parent’s socioeconomic status, as one would expect that the ‘treatment’ of low birth weight not to be randomly assigned amongst different socioeconomic strata. Once controls for various confounding variables are introduced, it is then feasible to measure the causal effect of the treatment. Methods that perform this adjustment include simple linear regression modelling, or various forms of matching estimators.

The key assumption in the above is that the researcher is able to separate out differences between the treatment and control groups based on observable characteristics (selection on observables). However, there are many cases, especially in the social sciences, where it is not unreasonable to suspect this assumption does not hold (selection based on the unobservable). In this scenario, the use of instrumental variable estimators represents a viable solution.

Unfortunately, suitable instrumental variables are commonly not available to researchers. In this instance is there anything a researcher can do? Yes, according to a recently published paper by Daniel L. Millimet and Rusty Tchernis. Millimet and Tchernis examine this problem from the point of view of a researcher trying to minimize the bias associated with selection on unobservables. Their paper demonstrates how the bias of ATE can be derived from a regression model of the probability of treatment on observable characteristics. Using this regression model, it is possible to find a bias minimizing propensity score (P*). Once this score is calculated, the researcher is able to estimate a bias reduced ATE by trimming observations outside a pre-specified neighborhood around P*.

Millimet and Tchernis propose a number of estimators that one can use to produce bias minimized ATEs. Those interested in potentially using this bias minimization strategy should refer to their paper for a more detailed examination of these estimators, and their potential uses/misuses. This paper also features a neat empirical application that suggests why their approach might be better than the more conventional methods.

In the below, I have supplied an image summarizing the output produced from a Monte Carlo exercise to highlight the efficacy of the bias corrected approach. Those interested in design of this experiment should look at the Millimet and Tchernis paper, since I have simply replicated their MC design (250 datasets with 5,000 observations in each), with the assumption that the correlation between the treatment equation error term and outcome equation is −0.6 (so a really strong correlation). Additionally, I have set the trimming parameter (theta) to 0.05, so at least 5% of the treatment and 5% of the control group are contained in the trimmed sample.

Note that full descriptions of the acronyms in this image can be found in the paper, although MB.BC and IPW.BC refer to the bias corrected measures of the ATE, and from the image it is clear to see that these estimators are much closer to the true average treatment effect of 1, albeit with a higher variance. This simulation was conducted with R using a self-written function. I have benchmarked this function against Daniel Millimet’s STATA function, and the results are identical. I hope to possibly release this function as an R package in the future, although I would be happy to supply my function’s code to anyone who is interested.

To leave a comment for the author, please follow the link and comment on their blog: DiffusePrioR » R. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)