# Unobserved Effects With Panel Data

December 4, 2013
By

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It is common for researchers to be concerned about unobserved effects being correlated with observed explanatory variables.

For instance, if we were curious about the effect of meditation on emotional stability we may be concerned that there might be some unobserved factor such as personal genetics that might  predict both likelihood to meditate and emotional stability.

In order to remove this potentially biasing effect we could think about taking measurements over multiple periods for the same individual inquiring about frequency of meditation and emotional stability.

If we observe that within the same individual, removing the time constant effects which (presumably) genetics is a component of that there is still a relationship between meditation and emotional stability, then we may feel on firmer ground as to our hypothesis that mediation may lead to more emotional stability.

In order to accomplish the goal of estimating this relationship we may experiment with a “fixed effects” model defined as:

$$y_{it}=x_{it}\beta + a_i+u_{it}$$

In this typical linear model with panel data, there is no problem including an arbitrary number of dummy variables.  Let’s see this in action.

nperson <- 300 # Number of personsnobs <- 3      # Number of observations per person # In order for unobserved person effects to be a problem they must be# correlated with the explanatory variable. # Let's say: x = x.base + fe fe.sd <- 1 # Spefify the standard deviation of the fixed effedx.sd  <- 1 # Specify the base standard deviation of x beta <- 2 # First generate our data using the time constant effectsconstantdata <- data.frame(id=1:nperson, fe=rnorm(nperson)) # We expand our data by nobsfulldata <- constantdata[rep(1:nperson, each=nobs),] # Add a time index, first define a group apply function# that applies by group index.gapply <- function(x, group, fun) {  returner <- numeric(length(group))  for (i in unique(group))     returner[i==group] <- get("fun")(x[i==group])  returner} # Using the generalized apply function coded abovefulldata$t <- gapply(rep(1,length(fulldata$id)),                          group=fulldata$id, fun=cumsum) # Or a more simplified functionindexer <- function(group) { returner <- numeric(length(group)) for (i in unique(group)) returner[i==group] <- 1:sum(i==group) returner} # Is a special case of gapplyfulldata$t <- indexer(fulldata$id) # Now we are ready to caculate the time variant xsfulldata$x <- fulldata$fe + rnorm(nobs*nperson) # And our unobservable errorfulldata$u <- rnorm(nobs*nperson) # Finally we are ready to simulate our y variablesfulldata$y <- .5*fulldata$x + .5*fulldata$fe + fulldata$u # First lets see how our standard linear model performs:summary(lm(y~x, data=fulldata)) # Adding a dummy variable removes the biassummary(lm(y~x+factor(id), data=fulldata)) # The same result can be taken by removing the mean from# both the explanatory variables and the dependent variables.# Why is that?

Think of the problem as:
$$y_{it}=x_{it}\beta + a_i+u_{it}$$
So $$y_{it}-mean_t(y_i)=(x_{it}-mean_t(x_{i}))\beta + a_i-mean_t(a_i)+u_{it}-mean(u_i)$$

Because the unobservable effect is constant over time it drops out.  And so long as their was a term controlling for the average unobservable effect (the dummy variables) then the average per person unobserved error must by definition be equal to zero.

thus: $$y_{it}-mean_t(y_i)=(x_{it}-mean_t(x_{i}))\beta + u_{it}$$

fulldata$ydemean <- fulldata$y-ave(fulldata$y, group=fulldata$id)fulldata$xdemean <- fulldata$x-ave(fulldata$x, group=fulldata$id) summary(lm(ydemean~xdemean-1, data=fulldata)) # We can also accomplish this by adding the Chamberlain device to the # that regression is the total or mean of the explanatory variables at# the level of each individual.fulldata$xmean <- ave(fulldata$x, group=fulldata$id) fulldata$xsum <- gapply(fulldata$x, group=fulldata$id, fun=sum) # This is a little trickier to figure out how it accomplishes the task# of differencing out the unobserved effect.   # This is how I think of it. The unobserved individual effect must be # correlatedwith the explanatory variable in aggrogate to be a problem. # However, that correlation can only be on the individual level since# by definition the "fixed effect" is constant on the individual level.# Thus by creating a new variable which is the average or total for# each individual, we are allocating to that variable any variation# which correlates with the explanatory variable.  summary(lm(y~x+xmean, data=fulldata))summary(lm(y~x+xsum, data=fulldata))

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.