Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Quickstart

The goal of this exercise is to replicate some of the results in Petersen (2009) using the R programming language.

# load required packages
library(lfe)

## Estimating Standard Errors with a Firm Effect OLS and Rogers Standard Errors

In Petersen (2009), the author simulated a panel data set and then estimated the slope coefficient and its standard error. By doing this multiple times we can observe the true standard error as well as the average estimated standard errors. In the first version of the simulation, the author includes a fixed firm effect but no time effect in both the independent variable as well as in the residual. Across simulations it is assumed that the standard deviation of the independent variable and the residual were both constant at one and two respectively. This will produce an R2 of 20 percent which is not unusual for empirical finance regressions. Across different simulations, the fraction of the variance in the independent variable which is due to the firm effect is altered. This fraction ranged from zero to seventy-five percent in twenty-five percent increments. The same is done for the residual. This allows to demonstrate how the magnitude of the bias in the OLS standard errors varies with the strength of the firm effect in both the independent variable and the residual. The R function to run the simulation is provided below.

simulate <-
function(n.year = 10,   # number of years in the panel
n.firms = 500, # number of firms in the panel
n.iter = 5000, # number of iterations to run the simulation
sd.x = 1,      # standard deviation of the independent variable
sd.r = 2,      # standard deviation of the error term
firm_x = 0,    # % of independent variable variance which is due to firm fixed effect
firm_r = 0     # % of residual variance which is due to firm fixed effect
) {

# set RNG seed t
set.seed(123)

# total observation in the panel
n.obs <- n.year*n.firms

# run simulatins
# store the coefficient estimate, its standard deviation and the cluster robust standard deviation
b <- sapply(1:n.iter, function(i){

# replicate firm ID for each year
firm_id <- rep(1:n.firms, each = n.year)

# standardized regression variable
x <-
rnorm(n.obs, mean = 0, sd = sqrt(1-firm_x)) +                     # non-fixed firm effect
rep( rnorm(n.firms, mean = 0, sd = sqrt(firm_x)), each = n.year ) # fixed firm effect
# standardized error term

r <-
rnorm(n.obs, mean = 0, sd = sqrt(1-firm_r)) +                     # non-fixed firm effect
rep( rnorm(n.firms, mean = 0, sd = sqrt(firm_r)), each = n.year ) # fixed firm effect

# scale regressor by its standard deviation
x <- sd.x * x

# scale error term by its standard deviation
r <- sd.r * r

# response variable
y <- 1*x + r

# standard OLS
m1 <- felm(y ~ x)

# OLS cluster by firm
m2 <- felm(y ~ x | 0 | 0 | firm_id)

# store and return coefficients
c1 <- coef(summary(m1))['x', ]
c2 <- coef(summary(m2))['x', ]

# return
return( c(c1['Estimate'], c1['Std. Error'], c2['Cluster s.e.']) )

})

# store and return average coefficients
res <- c(apply(b, 1, mean), 'Sample Std. Error' = sd(b['Estimate',]))

# return
return(res)

}

We can now easily reproduce Petersen’s results. The following example simulate the model where 25% of the independent variable’s variance and 50% of the residual variance is due to a firm specific effect.

simulate(firm_x = 0.25, firm_r = 0.50)
##          Estimate        Std. Error      Cluster s.e. Sample Std. Error
##        0.99900551        0.02826734        0.04111877        0.04151040

Running simulations for several combination of independent variable’s variance and residual variance due to a firm specific effect produces the following results, in accordance with the original paper.

 Source of Independent Variable Volatility Source of Residual Volatility 0% 25% Estimating Standard Errors with a Firm Effect OLS and Rogers Standard Errors. The magnitude of the bias in the OLS standard errors varies with the strength of the firm effect in both the independent variable and the residual. 0% Avg(B) 1.000 1.000 1.000 1.000 Std(B) 0.028 0.029 0.029 0.029 Avg(SE.OLS) 0.028 0.028 0.028 0.028 Avg(SE.R) 0.028 0.028 0.028 0.028 25% Avg(B) 0.999 0.999 0.999 0.999 Std(B) 0.028 0.036 0.042 0.047 Avg(SE.OLS) 0.028 0.028 0.028 0.028 Avg(SE.R) 0.028 0.035 0.041 0.046 50% Avg(B) 0.999 0.999 0.999 0.999 Std(B) 0.028 0.042 0.051 0.060 Avg(SE.OLS) 0.028 0.028 0.028 0.028 Avg(SE.R) 0.028 0.041 0.051 0.059 75% Avg(B) 1.000 0.999 0.999 0.999 Std(B) 0.029 0.047 0.060 0.070 Avg(SE.OLS) 0.028 0.028 0.028 0.028 Avg(SE.R) 0.028 0.046 0.059 0.069 † The table contains estimates of the coefficient and standard errors based on 5000 simulation of a panel data set (10 years per firm and 500 firms). The true slope coefficient is 1, the standard deviation of the independent variable is 1 and the standard deviation of the error term is 2. The fraction of the residual variance which is due to a firm specific component is varied across the rows of the table and varies from 0% (no firm effect) to 75%. The fraction of the independent variable’s variance which is due to a firm specific component also varies across the columns of the table and varies from 0% (no firm effect) to 75%. Each cell contains the average slope coefficient estimated by OLS and the standard deviation of this estimate. This is the true standard error of the estimated coefficient. The third entry is the OLS estimated standard error of the coefficient. The fourth entry is Rogers’ (clustered) standard error which accounts for possible clustering at the firm level (i.e. accounts for the possible correlation between observations of the same firm in different years).