Introducing the ArCo package

[This article was first published on R – insightR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

By Gabriel Vasconcelos

What is the ArCo??

We recently launched the R package ArCo. It is an implementation of the Artificial Counterfactual method proposed by Carvalho, Masini and Medeiros (2016). This post will review some of its features and show how simple it is to estimate “what would have happened” if something “had not happened”. Counterfactuals are useful when different groups for control and treatment are not feasible.

For example, if a new tax was introduced in some region and we wanted to know if it had some impact on the economic activity, we could use the ArCo and estimate what would have happened in a world this tax never existed. Moreover, the ArCo gives us the possibility to statistically test the tax effects and see if they were significant. Last but not least, suppose that we expect the tax to show its effects only a few months after it was introduced, the ArCo allows us to estimate the exact month these effects started to show.

How does it work??? Following the same example, the ArCo will use a first step estimation to extract information from the behaviour of the economic activity in the region we are interested in terms of the economic activity (and other variables) from other regions. In the second step the ArCo will project what would have happened in our region of interest given what happened in the other regions that were not affected by the new tax.

Application

The example presented below is a simplification of the application made by Carvalho, Masini and Medeiros (2016) with a dataset supplied by the authors.

In October 2007 the government of the Brazilian State of São Paulo launched a tax rebate program called Nota Fiscal Paulista (NFP) that aimed to reduce tax evasion. The program acts as an incentive for the consumers to ask for an electronic receipt, which gave them the right to participate in lotteries and to the tax rebate.

The reduced tax evasion increased costs to the sellers, and if they had some market power they could simply transfer the new tax costs to the consumers. This effect would have an impact on inflation. This is precisely what we want to test with the ArCo. The first sector targeted by the NFP was restaurants, therefore, we are going to use the food away from home (FAH) component of the inflation.

Data Structure

The NFP data is available in the ArCo package, which can be installed in R with install.packages("ArCo"). Let’s begin by loading the package and the data.

library(ArCo)
data("inflationNFP")
lapply(inflationNFP,dim)

## $inflationFAH
## [1] 56  9
## 
## $GDP
## [1] 56  9

The inflationNFP dataset is a list with two matrices, the first matrix contains the FAH component of the inflation for 9 Brazilian metropolitan regions (São Paulo is the first) and the second matrix contains the GDP for the same metropolitan regions. The data goes from January 1995 to September 2009. The NFP was introduced in the 34th observation.

This list structure is precisely how the ArCo package must receive the data. Each matrix in the list is a variable, and each column in the matrices is a unity (metropolitan regions). The minimum requirement to estimate the ArCo is a list with at least one matrix containing a few units. If your data has a panel structure you should use the panel_to_ArCo_list function that will set-up the data in lists for you.

Estimation and Prediction Functions

The user is totally free to choose which model will be used to estimate the ArCo. All he needs to do is to specify an estimation function and a prediction function. The estimation function must have two arguments: X (independent variable) and y (dependent variable). The prediction function also has two arguments: model (the output form the estimation function) and newdata (data used for the prediction). The two functions for linear regression are presented below:

fn=function(X,y){
  return(lm(y~X))
}
p.fn=function(model,newdata){
  b=coef(model)
  return(cbind(1,newdata) %*% b)
}

Fitting the ArCo

We are now ready to estimate the ArCo. The main function of the package is called fitArCo. It estimates the ArCo with the user specified functions and calculates the most important statistics.

t0=34
ArCoNFP=fitArCo(data=inflationNFP,fn=fn,p.fn=p.fn,treated.unity=1,t0=t0,VCOV.type = "nw")
ArCoNFP$delta

##                        LB       delta         UB
## inflationFAH  0.055308578 0.451825488 0.84834240
## GDP          -0.003091461 0.003722729 0.01053692

ArCoNFP$p.value

##            [,1]
## [1,] 0.06838351

plot(ArCoNFP,plot=1,display.fitted = TRUE)

plot of chunk unnamed-chunk-3

The $delta showed that the effects of the NFP on the inflation were positive and statistically different from zero as we first argued. We can recover the CPI from the inflation in order to see the cumulative effects on prices caused by the NFP. This is done in the code below:

FAHsp=inflationNFP$inflationFAH[,1]
real=cumprod(1+FAHsp/100)
cf=cumprod(1+c(FAHsp[1:(t0-1)],ArCoNFP$cf[,1])/100)
fitted=cumprod(1+fitted(ArCoNFP)[,1]/100)

plot(real,type="l",ylab="Y1",xlab="Time")
lines(c(rep(NA,t0-1),tail(cf,length(real)-t0+1)),col=4)
lines(fitted,col=2)
abline(v=t0,col=4,lty=2)
legend("topleft",legend=c("Observed","Fitted","Counterfactual"),col=c(1,2,4),lty=1,lwd=1,cex=1,seg.len = 1,bty="n")

plot of chunk unnamed-chunk-4

Some Observations

The way the functions fn and p.fn should be supplied to the fitArCo is directly compatible with many R packages. For example, if you want to estimate the ArCo using cross-validation LASSO you can install the package glmnet and use the functions cv.glmnet and predict. The package randomForest is also compatible.

References

[1] Y. R. Fonseca, M. C. Medeiros, R. Masini, and G. F. R. Vasconcelos. ArCo: Artificial Counterfactual Package , 2017. URL https://CRAN.R-project.org/package=ArCo . R package version 0.1-1

[2] C. Carvalho, R. Masini, M. C. Medeiros, ARCO: An Artificial Counterfactual Approach For High-Dimensional Panel Time-Series Data (August 15, 2016). Available at SSRN: https://ssrn.com/abstract=2823687 or http://dx.doi.org/10.2139/ssrn.2823687


To leave a comment for the author, please follow the link and comment on their blog: R – insightR.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)