Site icon R-bloggers

Paper Replication Petersen 2009

[This article was first published on R Tutorials, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Quickstart

The goal of this exercise is to replicate some of the results in Petersen (2009) using the R programming language.

# load required packages
library(lfe)

Estimating Standard Errors with a Firm Effect OLS and Rogers Standard Errors

In Petersen (2009), the author simulated a panel data set and then estimated the slope coefficient and its standard error. By doing this multiple times we can observe the true standard error as well as the average estimated standard errors. In the first version of the simulation, the author includes a fixed firm effect but no time effect in both the independent variable as well as in the residual. Across simulations it is assumed that the standard deviation of the independent variable and the residual were both constant at one and two respectively. This will produce an R2 of 20 percent which is not unusual for empirical finance regressions. Across different simulations, the fraction of the variance in the independent variable which is due to the firm effect is altered. This fraction ranged from zero to seventy-five percent in twenty-five percent increments. The same is done for the residual. This allows to demonstrate how the magnitude of the bias in the OLS standard errors varies with the strength of the firm effect in both the independent variable and the residual. The R function to run the simulation is provided below.

simulate <- 
  function(n.year = 10,   # number of years in the panel
           n.firms = 500, # number of firms in the panel
           n.iter = 5000, # number of iterations to run the simulation
           sd.x = 1,      # standard deviation of the independent variable
           sd.r = 2,      # standard deviation of the error term
           firm_x = 0,    # % of independent variable variance which is due to firm fixed effect 
           firm_r = 0     # % of residual variance which is due to firm fixed effect 
  ) {
  
  # set RNG seed t 
  set.seed(123)
  
    # total observation in the panel  
  n.obs <- n.year*n.firms
  
  # run simulatins 
  # store the coefficient estimate, its standard deviation and the cluster robust standard deviation  
  b <- sapply(1:n.iter, function(i){
    
    # replicate firm ID for each year
    firm_id <- rep(1:n.firms, each = n.year)
    
    # standardized regression variable
    x <- 
      rnorm(n.obs, mean = 0, sd = sqrt(1-firm_x)) +                     # non-fixed firm effect 
      rep( rnorm(n.firms, mean = 0, sd = sqrt(firm_x)), each = n.year ) # fixed firm effect 
    # standardized error term
    
    r <- 
      rnorm(n.obs, mean = 0, sd = sqrt(1-firm_r)) +                     # non-fixed firm effect 
      rep( rnorm(n.firms, mean = 0, sd = sqrt(firm_r)), each = n.year ) # fixed firm effect 
    
    # scale regressor by its standard deviation
    x <- sd.x * x
    
    # scale error term by its standard deviation
    r <- sd.r * r
    
    # response variable
    y <- 1*x + r
    
    # standard OLS
    m1 <- felm(y ~ x) 
    
    # OLS cluster by firm
    m2 <- felm(y ~ x | 0 | 0 | firm_id) 
    
    # store and return coefficients
    c1 <- coef(summary(m1))['x', ]
    c2 <- coef(summary(m2))['x', ]
    
    # return
    return( c(c1['Estimate'], c1['Std. Error'], c2['Cluster s.e.']) )
    
  })
  
  # store and return average coefficients
  res <- c(apply(b, 1, mean), 'Sample Std. Error' = sd(b['Estimate',]))
  
  # return
  return(res)
  
}

We can now easily reproduce Petersen’s results. The following example simulate the model where 25% of the independent variable’s variance and 50% of the residual variance is due to a firm specific effect.

simulate(firm_x = 0.25, firm_r = 0.50)
##          Estimate        Std. Error      Cluster s.e. Sample Std. Error 
##        0.99900551        0.02826734        0.04111877        0.04151040

Running simulations for several combination of independent variable’s variance and residual variance due to a firm specific effect produces the following results, in accordance with the original paper.

Estimating Standard Errors with a Firm Effect OLS and Rogers Standard Errors. The magnitude of the bias in the OLS standard errors varies with the strength of the firm effect in both the independent variable and the residual.
Source of Independent Variable Volatility
Source of Residual Volatility 0% 25% 50% 75%
0%
  Avg(B) 1.000 1.000 1.000 1.000
  Std(B) 0.028 0.029 0.029 0.029
  Avg(SE.OLS) 0.028 0.028 0.028 0.028
  Avg(SE.R) 0.028 0.028 0.028 0.028
25%
  Avg(B) 0.999 0.999 0.999 0.999
  Std(B) 0.028 0.036 0.042 0.047
  Avg(SE.OLS) 0.028 0.028 0.028 0.028
  Avg(SE.R) 0.028 0.035 0.041 0.046
50%
  Avg(B) 0.999 0.999 0.999 0.999
  Std(B) 0.028 0.042 0.051 0.060
  Avg(SE.OLS) 0.028 0.028 0.028 0.028
  Avg(SE.R) 0.028 0.041 0.051 0.059
75%
  Avg(B) 1.000 0.999 0.999 0.999
  Std(B) 0.029 0.047 0.060 0.070
  Avg(SE.OLS) 0.028 0.028 0.028 0.028
  Avg(SE.R) 0.028 0.046 0.059 0.069
† The table contains estimates of the coefficient and standard errors based on 5000 simulation of a panel data set (10 years per firm and 500 firms). The true slope coefficient is 1, the standard deviation of the independent variable is 1 and the standard deviation of the error term is 2. The fraction of the residual variance which is due to a firm specific component is varied across the rows of the table and varies from 0% (no firm effect) to 75%. The fraction of the independent variable’s variance which is due to a firm specific component also varies across the columns of the table and varies from 0% (no firm effect) to 75%. Each cell contains the average slope coefficient estimated by OLS and the standard deviation of this estimate. This is the true standard error of the estimated coefficient. The third entry is the OLS estimated standard error of the coefficient. The fourth entry is Rogers’ (clustered) standard error which accounts for possible clustering at the firm level (i.e. accounts for the possible correlation between observations of the same firm in different years).

To leave a comment for the author, please follow the link and comment on their blog: R Tutorials.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.