Fama-MacBeth and Cluster-Robust (by Firm and Time) Standard Errors in R

June 2, 2012
By

(This article was first published on landroni » R, and kindly contributed to R-bloggers)

Ever wondered how to estimate Fama-MacBeth or cluster-robust standard errors in R? It can actually be very easy.

First, for some background information read Kevin Goulding’s blog post, Mitchell Petersen’s programming advice, Mahmood Arai’s paper/note and code (there is an earlier version of the code with some more comments in it). For more formal references you may want to look into Thompson (2011, JFE) and Petersen (2008, WP). Both papers focus on estimating robust SE using Stata.

After extensively discussing this with Giovanni Millo, co-author of 'plm', it turns out that released R packages ('plm', 'lmtest', 'sandwich') can readily estimate clustered SEs. The results are not exactly the same as the Stata output, since in 'plm' the options 'HC0' through 'HC4' for 'vcovHC()' do not use the exact same weighting (by a function of sample size) that Stata uses for small-sample correction. But the results are sensibly similar when using 'HC1'.

It should be easy to (almost exactly) replicate M. Petersen’s benchmark results using the following code.

Import M. Petersen’s test data.

require(foreign)
require(plm)
require(lmtest)
test <- read.dta("http://www.kellogg.northwestern.edu/faculty/petersen/htm/papers/se/test_data.dta")

Estimate linear model using OLS. The second call estimates the Fama-MacBeth regression.

fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))
fpmg <- pmg(y~x, test, index=c("year","firmid")) ##Fama-MacBeth

Define a function that would estimate robust SE with double-clustering.

##Double-clustering formula (Thompson, 2011)
vcovDC <- function(x, ...){
    vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) - 
        vcovHC(x, method="white1", ...)
}

Estimate OLS standard errors, White standard errors, standard errors clustered by group, by time, and by group and time. Compare the R output with M. Petersen’s benchmark results from Stata.

> ##OLS, White and clustering
> coeftest(fpm)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.028359  1.0466   0.2954    
x           1.034833   0.028583 36.2041   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, method="white1", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.028361  1.0465   0.2954    
x           1.034833   0.028395 36.4440   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.066952  0.4433   0.6576    
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.022189  1.3376   0.1811    
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.029680   0.064580  0.4596   0.6458    
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

As Giovanni interestingly pointed out to me (in a privately circulated draft paper), it seems that the Fama-MacBeth estimator is nothing more than what econometricians call the Mean Groups estimator, and 'plm' can readily estimate this. You only need to swap the 'group' and 'time' indices. (See pmg() call above.)

> ##Fama-MacBeth
> coeftest(fpmg)

t test of coefficients:

            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 0.031278   0.023356  1.3392   0.1806    
x           1.035586   0.033342 31.0599   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 

As a last remark, it may be a good idea to introduce a type='HC5', implementing the exact Stata small-sample correction procedure, to allow users to benchmark R output against Stata results.


To leave a comment for the author, please follow the link and comment on his blog: landroni » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.