Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Ever wondered how to estimate Fama-MacBeth or cluster-robust standard errors in R? It can actually be very easy.

First, for some background information read Kevin Goulding’s blog post, Mitchell Petersen’s programming advice, Mahmood Arai’s paper/note and code (there is an earlier version of the code with some more comments in it). For more formal references you may want to look into Thompson (2011, JFE) and Petersen (2008, WP). Both papers focus on estimating robust SE using Stata.

After extensively discussing this with Giovanni Millo, co-author of 'plm', it turns out that released R packages ('plm', 'lmtest', 'sandwich') can readily estimate clustered SEs. The results are not exactly the same as the Stata output, since in 'plm' the options 'HC0' through 'HC4' for 'vcovHC()' do not use the exact same weighting (by a function of sample size) that Stata uses for small-sample correction. But the results are sensibly similar when using 'HC1'.

It should be easy to (almost exactly) replicate M. Petersen’s benchmark results using the following code.

Import M. Petersen’s test data.

require(foreign)
require(plm)
require(lmtest)


Estimate linear model using OLS. The second call estimates the Fama-MacBeth regression.

fpm <- plm(y ~ x, test, model='pooling', index=c('firmid', 'year'))
fpmg <- pmg(y~x, test, index=c("year","firmid")) ##Fama-MacBeth


Define a function that would estimate robust SE with double-clustering.

##Double-clustering formula (Thompson, 2011)
vcovDC <- function(x, ...){
vcovHC(x, cluster="group", ...) + vcovHC(x, cluster="time", ...) -
vcovHC(x, method="white1", ...)
}


Estimate OLS standard errors, White standard errors, standard errors clustered by group, by time, and by group and time. Compare the R output with M. Petersen’s benchmark results from Stata.

> ##OLS, White and clustering
> coeftest(fpm)

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680   0.028359  1.0466   0.2954
x           1.034833   0.028583 36.2041   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> coeftest(fpm, vcov=function(x) vcovHC(x, method="white1", type="HC1"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680   0.028361  1.0465   0.2954
x           1.034833   0.028395 36.4440   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="group", type="HC1"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680   0.066952  0.4433   0.6576
x           1.034833   0.050550 20.4714   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> coeftest(fpm, vcov=function(x) vcovHC(x, cluster="time", type="HC1"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680   0.022189  1.3376   0.1811
x           1.034833   0.031679 32.6666   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

> coeftest(fpm, vcov=function(x) vcovDC(x, type="HC1"))

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.029680   0.064580  0.4596   0.6458
x           1.034833   0.052465 19.7243   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1



As Giovanni interestingly pointed out to me (in a privately circulated draft paper), it seems that the Fama-MacBeth estimator is nothing more than what econometricians call the Mean Groups estimator, and 'plm' can readily estimate this. You only need to swap the 'group' and 'time' indices. (See pmg() call above.)

> ##Fama-MacBeth
> coeftest(fpmg)

t test of coefficients:

Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.031278   0.023356  1.3392   0.1806
x           1.035586   0.033342 31.0599   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1


As a last remark, it may be a good idea to introduce a type='HC5', implementing the exact Stata small-sample correction procedure, to allow users to benchmark R output against Stata results.  