[This article was first published on Economics and R - R posts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Chapter 4 of my course Empirical Economics with R covers a popular strategy to estimate causal effects: difference-in-differences estimation. Inspired by Matt Taddy’s textbook example, the main application is to estimate the causal effects of search engine marketing on revenues based on an experiment conducted by eBay and studied by Blake, Nosko and Tadelis (2015).

The crucial data is illustrated in the following graph (for privacy concerns only a scaled an shifted version of eBays original data has been made publicly available):

In the experiment designated market areas (DMA) were assigned to either the treatment or control group, using imperfect randomization. For the DMAs in the treatment group search engine marketing was turned off in June and July.

In an ideal randomized experiment, we could just compare the average daily revenues in the (blue) treatment group with those of the control group (red) during the experimental period. This would suggest that turning off search engine marketing reduces average daily revenue per DMA by 28 thousand USD (`100.7-128.7 = -28`).

Yet, looking at the graph, we see that already before the experiment started the DMAs in the treatment group had on average 26.6 thousand USD lower daily revenues than those of the control group (`105.8-132.4 = -26.6`). In general, such imbalances may be due to imperfect randomization or small sample sizes. The difference-in-differences (DiD) estimator is simply given by `-28-(-26.6) = -1.4`, i.e. it corrects the outcome difference between treatment and control group during the experiment with the outcome difference before the experiment started. This suggests a much smaller effect of search engine marketing on (short-term) revenues.

The DiD estimator is quite intuitive and popular in empirical economic studies. In practice, one typically performs DiD estimation via a linear regression with appropriate fixed effects. You can work through these regressions yourself in the RTutor problem set for chapter 4. In addition to the search engine marketing application, the exercises will also replicate the main insights on a seminal difference-in-differences study by Card and Krüger (1994) on causal effects of minimum wages on employment using a natural experiment.

You can find all material in the course’s Github repository. Take a look at the setup instructions if you want to solve the RTutor problem sets on your own computer.

Note that there are some very interesting recent research articles like de Chaisemartin and D’Haultfoeuille (2020) that illustrate potential biases in DID estimation via fixed effects if treatment effects are heterogeneous and not all treatments start at the same time. Proposed methods to solve the problems are e.g. implemented in the R packages did and DIDmultiplegt.

To leave a comment for the author, please follow the link and comment on their blog: Economics and R - R posts.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

# Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts.(You will not see this message again.)

Click here to close (This popup will not appear again)