Empirical Economics with R (Part D): Instrumental Variable Estimation and Potential Outcomes
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.
Chapter 5 of my course Empirical Economics with R covers instrumental variable (IV) estimation.
While being one of the most popular methods in academic economic papers for estimating causal effects (see e.g. the statistics here), I was not sure whether to introduce IV estimation in this Bachelor level course. My hesitation was due to the fact that despite peer reviews instrument exogeneity seems debatable in some studies and I feared worse if today’s students would indeed start using IV as practitioners in their jobs. But then this quote from the textbook of Matt Taddy (former econometrics professor who became a vice president at Amazon) changed my mind:
As a final point on the importance of IV models and analysis, note that when you are on the inside of a firm especially on the inside of a modern technology firm explicitly randomized instruments are everywhere.
So my goal for this chapter was to find an interesting, reproducible application that uses an explicitly randomized instrument. And I found the great study “Private and Public Provision of Counseling to Job Seekers: Evidence from a Large Controlled Experiment.” by Behaghel et al. (2014)
In 2007 and 2008 a large scale randomized experiment was conducted in France to compare the success of 3 different counseling programs for job seekers:

An intensive counseling program by private firms.

A new intensive counseling program by the public employment agency.

The standard job counseling by the public employment agency.
In the private and intensive public program one case worker did assist at most 40 job seekers, while in the standard job counseling a case worker assists on average 120 job seekers.
The chapter first only compares the intensive public counseling with the standard job counseling. A sample of job seekers were randomly assigned to a treatment group that got the option for intensive counseling and a control group who only got the standard counseling. However, only 31.7% of the job seekers in the treatment group accepted the intensive public counseling, the remaining 68.3% refused and got, like the control group members, the standard counseling.
Take a look at the following three regressions:
dat = readRDS("C:/lehre/empecon/slides/jc_small.Rds") reg.itt = lm(job_6m ~ treat_option, data=dat) reg.ols = lm(job_6m ~ treated, data=dat) library(AER) reg.iv = ivreg(job_6m ~ treated  treat_option, data=dat) library(stargazer) stargazer(reg.itt, reg.ols, reg.iv,column.labels = c("OLSITT","OLS","IV"), model.names=FALSE, type="html",keep.stat = c("n"))
Dependent variable:  
job_6m  
OLSITT  OLS  IV  
(1)  (2)  (3)  
treat_option  0.032^{***}  
(0.010)  
treated  0.079^{***}  0.102^{***}  
(0.014)  (0.031)  
Constant  0.201^{***}  0.204^{***}  0.201^{***} 
(0.006)  (0.005)  (0.006)  
Observations  7,224  7,224  7,224 
Note:  ^{*}p<0.1; ^{**}p<0.05; ^{***}p<0.01 
The first OLS regression regresses an indicator whether within 6 months a job has been found on a dummy variable treat_option
that is 1 if and only if the subject was in the treatment group and thus got the option for intensive counseling. The estimator of 3.2 percentage points is called the intentiontotreat effect. Yet, given that a large fraction of subjects in the treatment group refuses the intensive counseling, the intentiontotreat effect clearly underestimates the causal effect to receive the intensive job counseling.
The second regression uses the dummy variable treated
that is 1 only for a subject that is in the treatment group and accepted the intensive counseling. Yet, the estimate, i.e. a 7.9 percentage points increase of the probability to find a job, is most likely biased. That is because the decision to accept or reject the intensive counseling is most likely related with job searchers’ characteristics that influence their baseline probabilities to find a job.
The third regression is an instrumental variable regression that uses treat_option
as instrument for treated
. If treat_option
is perfectly randomized, the IV estimator consistently estimates the causal effect of the intensive counseling on the probability to find a job within 6 months. Given the baseline probability of roughly 20.1% to find a job within 6 months, the IV estimate of 10.2 percentage points means that the intensive public counseling seems to increase the probability to find a job by a staggering 50 percent. The intuition and required assumptions for the IV estimator are explained in the course.
The course introduces the IV estimator with the classic approach that assumes a homogeneous treatment effect among all subjects (or that all heterogeneity can be explained by exogenous explanatory variables). Then the course also briefly introduce the modern potential outcomes framework that starts with the assumption that the intensive counselling has heterogeneous effects across subjects. In our setting it seems quite likely that those subjects who refused the intensive counseling would have gotten a systematically different (probably lower) effect from the treatment than those subjects who accepted it. But which average treatment effect does our IV estimator estimate? The average treatment effect over all subjects, the average treatment for those subjects that are willing to accept the treatment, or some other average? The potential outcome yields an answer to this question. If you want to know it, just take a look at the course…
Rbloggers.com offers daily email updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/datascience job.
Want to share your content on Rbloggers? click here if you have a blog, or here if you don't.