What if..? Program Evaluation in the Labor Market.

[This article was first published on Boiled Data, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Start

Linear regression and her bigger brother BART: Treatment effects in the labor market

1 Oct 2022

#Preliminaries:
knitr::opts_chunk$set(message=FALSE, warning=FALSE, eval = TRUE) #set eval = TRUE when run first

rm(list=ls())

library(tidyverse)
library(haven)
library(bartCause)
library(kableExtra)
library(tableone)
library(gridExtra)
library(grid)

theme_set(theme_light())

Introduction

Evaluation of the effectiveness of labor market programs is of great importance in order to learn how to design good programs and provide support efficiently. Consequently, there is a huge literature on the evaluation of labor market trainings and we add to this by allowing for heterogeneous treatment effects. We will evaluate the effectiveness of a federally-funded labor treatment program (LaLonde, 1986) which was implemented in the mid-1970s (US) with the objective of providing work experience. The male subsample we use consists of experimental treatment units and non-experimental comparison units (Dehejia & Wahba, 1999). Let us assess whether the treatment had an impact on earnings given available pre-treatment variables.

The Data

#download the data:
#lalonde <- haven::read_dta("http://www.nber.org/~rdehejia/data/nsw.dta") 
#saveRDS(lalonde, file="lalonde1986.rds")
lalonde_in <- readRDS("lalonde1986.rds") 

lalonde <- lalonde_in %>%
            mutate(race=as_factor(case_when(black==1 ~ 'black',
                                  hispanic==1 ~ 'hispanic',
                                  TRUE ~ 'other'))) %>%
           select(-c(black, hispanic, data_id)) %>%
           relocate(re78, .before = 1)


allvars <- colnames(lalonde)
allvars <- allvars[allvars!='treat'] 
catvars <- c("race", "married", "nodegree")


#Descriptive Statistics:
tab1 <- CreateTableOne(vars = allvars, strata="treat", data = lalonde, factorVars = catvars) 

kableone(tab1, caption = "Table 1: Descriptive Statistics", col.names=c("Treat=0", "Treat=1", "p", ""))  %>%
  remove_column(5) %>%
  kable_classic(full_width = F)
Table 1: Descriptive Statistics
Treat=0 Treat=1 p
n 425 297
re78 (mean (SD)) 5090.05 (5718.09) 5976.35 (6923.80) 0.061
age (mean (SD)) 24.45 (6.59) 24.63 (6.69) 0.721
education (mean (SD)) 10.19 (1.62) 10.38 (1.82) 0.136
married = 1 (%) 67 (15.8) 50 (16.8) 0.778
nodegree = 1 (%) 346 (81.4) 217 (73.1) 0.010
re75 (mean (SD)) 3026.68 (5201.25) 3066.10 (4874.89) 0.918
race (%) 0.567
black 340 (80.0) 238 (80.1)
hispanic 48 (11.3) 28 ( 9.4)
other 37 ( 8.7) 31 (10.4)

Table 1 shows that the mean of our variable of interest, the outcome re78 (income in 1978) is higher in our treatment group. Despite the observational character of our data, we see that the sample is relatively homogeneous concerning the pre-treatment variables: only the indicator variable nodegree, on whether a person has a high school diploma, seems to vary between treatment and control group significantly.

Strategy & Empirical Analysis

In the potential outcome framework (Rubin, 2005) the average treatment effect (ATE) can be written as the expectation \(E(\delta)=E[y^1-y^0]\), where \(y^1\) and \(y^0\) are potential outcomes of income in 1978. Under conditional independence assumption \((y^1,y^0)\bot T|X\) these outcomes are independent of treatment \(T\) given covariates \(X\).

We use 2 methods in order to estimate the average treatment effect (ATE). Firstly, we apply the well known linear regression for estimation and risk adjustment (eg. Angrist & Pischke, 2009; Pearl, 2013). Secondly, we consider Bayesian Additive Regression Trees (Chipman et al., 2010; J. L. Hill, 2011), a machine learning model recently becoming popular as it allows to detect relevant interactions in the data generating process and can be used to estimate heterogeneous treatment effects as well (Carnegie et al., 2019; J. Hill et al., 2020).

#Linear regression

lm <- lm(re78 ~ treat + re75 + age + education + race + married + nodegree, data=lalonde)

ate_lm <- as.data.frame(coef(lm)) %>%
        bind_cols(confint(lm)) %>%
        rownames_to_column() %>%
        filter(rowname=='treat') %>%
        rename(estimate=`coef(lm)`, ci.lower=`2.5 %`, ci.upper=`97.5 %`) %>%
        mutate(model='LinReg') %>%
        relocate(model, .before=1) %>%
        select(-rowname)
  
#BART

bc_fit <- bartc(lalonde[,'re78'], lalonde[,'treat'], lalonde[,3:8], n.samples = 1000L, estimand='ate',
                method.rsp='bart', method.trt='glm', verbose = FALSE) 


ate_bart <- summary(bc_fit)$estimates %>%
            select(-sd) %>%
            mutate(model='BART')  %>%
            relocate(model, .before=1)

#Prepare ATE-Output
ate <- ate_lm %>%
      bind_rows(ate_bart)

rownames(ate) <- NULL

kable(
  ate,
  digits = 2,
  caption = "Table 2: Treatment effect on income"
) %>%
  kable_classic(full_width = F) %>% 
  footnote(general = "ci: 95 % confidence intervals for linear regression, \n credible intervals for BART ")
Table 2: Treatment effect on income
model estimate ci.lower ci.upper
LinReg 806.51 -112.09 1725.11
BART 803.05 -308.61 1914.71
Note:
credible intervals for BART

We recognize in Table 2 that the treatment effect of BART is very similar to linear regression. The ATE-point estimates are both positive, suggesting that program participation increases average income by about 800 USD. However, both 95% confidence-, credible intervals show that there is a lot of variability in the data – an effect of zero cannot be rejected.

In addition to one point estimate we often would like to have information on treatment effects for specific subgroups. This could help us to redesign the program or allow for more effective enrollment. Obviously, these questions can be of interest also in other areas – imagine online marketing or personalized medicine for example. Thankfully, BART can also provide treatment effects for each observation in the data, which are called the individual conditional average treatment effects (iCATE): \(\tau(X_i)=E[y^1_i-y^0_i|X_i]\). Let us have a look at the distribution of individual CATE in Figure 1.

#Extract individual treatment effects
i_cate <- fitted(bc_fit, type="icate", sample = c("all"))


data_cate <- lalonde %>%
  bind_cols(as.data.frame(i_cate))
  

#Histogram CATE
data_cate %>%
  ggplot(aes(x=i_cate)) +
  geom_histogram(bins=20, color='black', fill='grey') + 
  theme_minimal() + labs(y = 'Number of people', x='individual CATE', title='Figure 1: Distribution of iCATE')

Interestingly, although the peak of the distribution is around 800 similar to ATE from above, the treatment effects vary a lot. There are even individuals for which a negative effect can be seen. In order to better understand the data generating process, we want to check next whether the individual treatment effects vary with explanatory variables, that could moderate the treatment effect. Figure 2 visualizes the treatment effects dependent on numeric variables age, income and years of education before treatment.

#Scatterplots:

#age
p2 <- data_cate %>%
  mutate(treat=as_factor(treat)) %>%
  ggplot(aes(y=i_cate, x=age)) +
  geom_point(aes(color=treat)) +
  scale_colour_manual(values=c("blue", "orange")) +
  geom_smooth(color='black') +
  theme_minimal() 


#income
p3 <- data_cate %>%
   mutate(treat=as_factor(treat)) %>%
  ggplot(aes(y=i_cate, x=re75)) +
  geom_point(aes(color=treat)) +
  scale_colour_manual(values=c("blue", "orange")) +
  geom_smooth(color='black') +
  theme_minimal() 


#educ
p4 <- data_cate %>%
   mutate(treat=as_factor(treat)) %>%
  ggplot(aes(y=i_cate, x=education)) +
  geom_point(aes(color=treat)) +
  geom_smooth(aes(y=i_cate, x=education)) +
  scale_colour_manual(values=c("blue", "orange")) +
  geom_smooth(color='black') +
  scale_x_continuous(breaks = rep(1:8)*2) +
  theme_minimal() 

#combine  plots next to each other
title1=textGrob("Figure 2: CATE by numeric variables", gp=gpar(fontface="bold"))
grid.arrange(p2,  p3, p4, ncol=2, top=title1) 

We see that program effectiveness increased with age until a threshold of 30 years. The program seems to have been most effective for people between 5 and 10K USD whereas the relationship between years of education and iCATE is U-shaped. The scatterplots allow to visually distinguish iCATE by treatment status – a very nice feature as discussed above. Next let us assess the interaction of our treatment with the factor variables.

#Boxplots:
#race
p5 <- data_cate %>%
  ggplot(aes(y=i_cate, x=race)) +
  geom_boxplot()  +
  theme_minimal() 

#married 
p6 <- data_cate %>%
  mutate(married=as.factor(married)) %>%
  ggplot(aes(y=i_cate, x=married)) +
  geom_boxplot() +
  theme_minimal() 

p7 <- data_cate %>%
  mutate(nodegree=as.factor(nodegree)) %>%
  ggplot(aes(y=i_cate, x=nodegree)) +
  geom_boxplot() +
  theme_minimal() 

title2=textGrob("Figure 2: CATE by factor variables", gp=gpar(fontface="bold"))
grid.arrange(p5,  p6, p7, ncol=2, top=title2) 

Figure 3 reveals that black colored benefited most from the program whereas being hispanic is associated with negative CATE. Married people benefited highly from program participation as well.

Discussion

We analysed the impact of a labor market program by using linear regression and BART, a tree based nonparametric Bayesian regression approach which very successfully recovers the underlying data generating process in the presence of interactions. We have found some evidence for effect moderation of the treatment, especially being married and black color are associated with a positive treatment effect. The relation between pre-treatment income and iCATE is nonlinear. Marital status and income were found to generate interaction effects by Rolling (2014) as well.

Let us recall the assumption that all relevant control variables are accounted for in the analysis (ignorability). This does not necessary hold true and is also emphasized by the seminal study of LaLonde (1986). See for example VanderWeele & Shpitser (2011) for a discussion of confounder selection. Furthermore, when estimating ATE we mainly focused on modeling the outcome equation of our treatment, hence we implicitly assume no selection into treatment. Luckily, other treatment effects (ATT, ATC) can be generated by re-weighting towards the population of interest (eg. Morgan & Todd, 2008), both in case of linear regression and BART.

References

Angrist, J. D., & Pischke, J.-S. (2009). Mostly harmless econometrics: An empiricist’s companion. Princeton university press.
Carnegie, N., Dorie, V., & Hill, J. L. (2019). Examining treatment effect heterogeneity using BART. Observational Studies, 5(2), 52–70.
Chipman, H. A., George, E. I., & McCulloch, R. E. (2010). BART: Bayesian additive regression trees. The Annals of Applied Statistics, 4(1), 266–298.
Dehejia, R. H., & Wahba, S. (1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. Journal of the American Statistical Association, 94(448), 1053–1062.
Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 217–240.
Hill, J., Linero, A., & Murray, J. (2020). Bayesian additive regression trees: A review and look forward. Annual Review of Statistics and Its Application, 7(1).
LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. The American Economic Review, 604–620.
Morgan, S. L., & Todd, J. J. (2008). 6. A diagnostic routine for the detection of consequential heterogeneity of causal effects. Sociological Methodology, 38(1), 231–282.
Pearl, J. (2013). Linear models: A useful “microscope” for causal analysis. Journal of Causal Inference, 1(1), 155–170.
Rolling, C. A. (2014). Estimation of conditional average treatment effects [PhD thesis]. University of Minnesota.
Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. Journal of the American Statistical Association, 100(469), 322–331.
VanderWeele, T. J., & Shpitser, I. (2011). A new criterion for confounder selection. Biometrics, 67(4), 1406–1413.
Wickham, H., Miller, E., & Smith, D. (2022). Haven: Import and export ’SPSS’, ’stata’ and ’SAS’ files. https://CRAN.R-project.org/package=haven
Zhu, H. (2021). kableExtra: Construct complex table with ’kable’ and pipe syntax. https://CRAN.R-project.org/package=kableExtra
To leave a comment for the author, please follow the link and comment on their blog: Boiled Data.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)