**Revolutions**, and kindly contributed to R-bloggers)

by Joseph Rickert

We all "know" that correlation does not imply causation, that unmeasured and unknown factors can confound a seemingly obvious inference. But, who has not been tempted by the seductive quality of strong correlations?

Fortunately, it is also well known that a well done randomized experiment can account for the unknown confounders and permit valid causal inferences. But what can you do when it is impractical, impossible or unethical to conduct a randomized experiment? (For example, we wouldn't want to ask a randomly assigned cohort of people to go through life with less education to prove that education matters.) One way of coping with confounders when randomization is infeasible is to introduce what Economists call instrumental variables. This is a devilishly clever and apparently fragile notion that takes some effort to wrap one's head around.

On Tuesday October 20th, we at the Bay Area useR Group (BARUG) had the good fortune to have Hyunseung Kang describe the work that he and his colleagues at the Wharton School have been doing to extend the usefulness of instrumental variables. Hyunseung's talk started with elementary notions: like explaining the effectiveness of randomized experiments, described the essential notion of instrumental variables and developed the background necessary for understanding the new results in this area. The slides from Hyunseung's talk available for download in two parts from the BARUG website. As with most presentations, these slides are little more than the mute residue of talk itself. Nevertheless, Hyunseung makes such imaginative used of animation and build slides that the deck is worth working through.

The following slide from Hyunseung's presentation captures the essence of the instrumental approach.

The general idea is that one or more variables, the instruments, are added to the model for the purpose of inducing randomness into the outcome. This has to be done in a way that conforms with the three assumptions mentioned in the figure. The first assumption, A1, is that the instrument variables are relevant to the process. The second assumption, A2, states that randomness is only induced into the exposure variables and not also into the outcome. The third assumption, A3, is a strong one: there are no unmeasured confounders. The claim is that if these three assumptions are met then causal effects can be estimated with coefficients for the exposure variables that are consistent and asymptotically unbiased.

In the education example developed by Hyunseung, the instrumental variables are the subject's proximity to 2 year and 4 year colleges. Here is where the "rubber meets the road" so to speak. Assessing the relevancy of the instrumental variables and interpreting their effects are subject to the kinds of difficulties described by Andrew Gelman in his post of a few years back.

In the second part of his presentation Hyunseung presents new work: (1) two methods that provide robust confidence intervals when assumption A1 is violated, (2) a method for implementing a sensitivity analysis to assess the sensitivity of an instrumental variable model to violations of assumptions A2 and A3, and (3) the R package ivmodel that ties it all together.

To delve even deeper into this topic have a look at the paper: Instrumental Variables Estimation With Some Invalid Instruments and its Application to Mendelian Randomization.

**leave a comment**for the author, please follow the link and comment on their blog:

**Revolutions**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...