# Counterfactual estimation on nonstationary data, be careful!!!

**R – insightR**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

**By Gabriel Vasconcelos**

In a recent paper that can be downloaded here, Carvalho, Masini and Medeiros show that estimating counterfactuals in a non-stationary framework (when I say non-stationary it means integrated) is a tricky task. It is intuitive that the models will not work properly in the absence of cointegration (spurious case), but what the authors show is that even with cointegration, the average treatment effect (ATE) converges to a non-standard distribution. As a result, standard tests on the ATE will identify treatment effects in several cases that there is no effect at all.

For those who are not familiar with counterfactual models: normally, these models have a treated unit (we want to know the effects of the treatment) and several untreated units that we call peers. There units may be cities, countries, companies, etc. Assuming that only one unit was treated and that there is no contamination, we can use information from the peers to project the behaviour of the treated unit as if it was not treated, which is the counterfactual. The models should also be able to identify if the treatment had no effects on the treated unit.

## Simulation tests

Here I am going to generate cointegrated data with no treatment to show the behaviour of Synthetic Control and the ArCo (Other techniques will have the same features). The data is very simple and it is based on the mentioned article’s example. We are going to generate 100 observations of 10 time-series, 9 random walks and one that is the sum of these random walks plus an error, which is the treated unit. However, I will not include any treatment and I want the models to show that there was no treatment. The hypothesis to be tested is that at we had some treatment in the treated unit.

library(ArCo) library(glmnet) library(Synth) T=100 # Number of Observations n=10 # Number of Time Series set.seed(1) # Seed for replication # Generate random walk peers peers=matrix(0,T,n-1) for(i in 2:T){ peers[i,]=peers[i-1,]+rnorm(n-1) } # Generate the treated unit y1=rowSums(peers)+rnorm(T) # Put all the TS together Y=cbind(y1,peers) # Plot all TS. The black line is the "treated unit" matplot(Y,type="l")

First, I am going to estimate the ArCo. The `$delta`

show the average treatment effect with 95\% confidence interval. As you can see, the test indicated that the treatment was statistically significant (the zero is very far from the confidence interval). The plot also show a dislocated counterfactual (blue line).

# Estimate the ArCo in the non-stationary data arco=fitArCo(list(Y),cv.glmnet,predict,treated.unity = 1,t0=51) arco$delta ## LB delta UB ## Variable1 -6.157713 -5.236705 -4.315697 plot(arco)

Now I am going to estimate the Synthetic Control. The Synth package does not have an implementation of a formal test. However, we can have a good idea from the counterfactual plot, which also indicates that the treatment was effective (even though there was no treatment).

# Estimate the Synthetic Control in the nonstationary data # Put the data in the Synth package notation sY=as.vector(Y) timeid=rep(1:T,ncol(Y)) unitid=sort(rep(1:ncol(Y),T)) synthdata=data.frame(time=timeid,unit=unitid,Y=sY) synthdataprep<- dataprep( foo = synthdata, predictors = c("Y"), predictors.op = "mean", dependent = "Y", unit.variable = "unit", time.variable = "time", treatment.identifier = 1, controls.identifier = c(2:10), time.predictors.prior = c(1:50), time.optimize.ssr = c(1:50), time.plot = 1:100 ) SC=synth(synthdataprep) path.plot(SC,synthdataprep) abline(v=50,col="blue",lty=2)

As you can see, both the ArCo and the Synthetic Control wrongly rejected the null hypothesis of no treatment. The best way to solve the problem is to take the first difference of all non-stationary time-series. However, there is an important thing to point out here. Imagine the random walk:

Suppose we have a treatment in that simply makes only for . For any we will have and for we will have . Therefore, if we take the first difference of the treatment will have an impact only at , which makes it impossible for any counterfactual model to identify an intervention. In this example, we would need an intervention that changed the drift of the Random Walk in a way that for all we have:

This will be better illustrated in an example. The plot below shows the difference between the two treatments. The red line is the case we can’t identify if we take the first difference. The Blue line changes the drift of the Random Walk, and therefore we can identify. **It is impossible, with the tools I am using here, to identify if we are in the black line or in the red line**.

y=rep(0,T) yt1=yt2=y d=1 c=10 for(i in 2:T){ e=rnorm(1) y[i]=y[i-1]+e if(i==51){ yt1[i]=c+yt1[i-1]+e }else{ yt1[i]=yt1[i-1]+e } if(i>=51){ yt2[i]=d+yt2[i-1]+e }else{ yt2[i]=yt2[i-1]+e } } plot(yt2,type="l",col=4,ylab="") lines(yt1,col=2) lines(y,col=1) legend("topleft",legend=c("untreated","treated by a constant in t0","treated in the drift"),col=c(1,2,4),bty="n",seg.len=0.5,cex=0.8,lty=1)

Now let’s get back to the first example. The results clearly indicated an intervention where there was none. Now I will show what happens if we take the first difference.

# Take the first difference dY=diff(Y,1) # Estimate the ArCo darco=fitArCo(list(dY),cv.glmnet,predict,treated.unity = 1,t0=50) darco$delta ## LB delta UB ## Variable1 -0.6892162 0.02737802 0.7439722 plot(darco)

# Adjust the data to the Synth and estimate the model dsY=as.vector(dY) timeid=rep(2:T,ncol(Y)) unitid=sort(rep(1:ncol(Y),T-1)) dsynthdata=data.frame(time=timeid,unit=unitid,Y=dsY) dsynthdataprep<- dataprep( foo = dsynthdata, predictors = c("Y"), predictors.op = "mean", dependent = "Y", unit.variable = "unit", time.variable = "time", treatment.identifier = 1, controls.identifier = c(2:10), time.predictors.prior = c(2:50), time.optimize.ssr = c(2:50), time.plot = 2:100 ) dSC=synth(dsynthdataprep) ## ## X1, X0, Z1, Z0 all come directly from dataprep object. ## ## ## **************** ## optimization over w weights: computing synthtic control unit ## ## ## ## **************** ## **************** ## **************** ## ## MSPE (LOSS V): 8.176593 ## ## solution.v: ## 1 ## ## solution.w: ## 2.9e-09 1.4e-08 3.7e-09 0.9999999 7.4e-09 2.9e-09 1.16e-08 6.9e-09 4e-09 path.plot(dSC,dsynthdataprep) abline(v=50,col="blue",lty=2)

As you can see, the `$delta`

in the ArCo showed that the treatment effect was statistically zero, which is true. The Synthetic Control plot also showed that there was no difference cause by the intervention even though the model adjustment was not very good.

Finally, note that we have two scenarios. If there is no intervention and we do not take the first difference we will probably wrongly reject the null hypothesis of no intervention. However, if we are in the “red line” case of the plot and we take the first difference, we will not reject the null when it may be false depending on the type of variable we are dealing with.

## References

**leave a comment**for the author, please follow the link and comment on their blog:

**R – insightR**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.