Fixed vs. random effects for browsing data – a simulation

[This article was first published on BernR, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.


When you work with trace data — data that emerge when people interact with technology — you will notice that such data often have properties that open up questions about statistical modelling. I currently work with browsing records, obtained at several times from the same users (i.e., a panel data set). A first typical characteristic of such data: Browsing behaviors are skewed. If you’re interested in people reading politically extreme web sites, you will find a few people doing it a lot, and most people doing little to none.

A second characteristic relates to the panel nature of the data. If you’re looking at people’s visits to online shops, they do not change their habits much over time — at least these within-person differences are not as great as the differences across people. Panel data lend itself to hierarchical modelling, for example with (1) fixed effects (FE) or (2) random-effects multilevel modelling (RE). For commonalities and differences between these two modelling approaches, see for example Gelman and Hill (2006).[1] The amount of within-person variation relative to between-person variation has important implications for these two approaches.

Below, I simulate the performance of FE vs. RE models, with these data characteristics in mind. I am mainly interested in the statistical power of each model, although my code can be easily adapted to examine, for example, bias and the RSME. The code builds on two sources: Conceptually, on the paper “Should I Use Fixed or Random Effects?” by Clark and Linzer (2015)[2]. In terms of code architecture, on the R package simhelpers, and its documentation.[3]

I consider a model with independent variable xᵢ and dependent variable yᵢ, for which we have i = 1, … m observations grouped in j = 1, … n units. Importantly, we model theunit effect (or unit intercept) aⱼ, which intuitively can be understood as explaining everything about y that cannot be explained by xᵢ at the unit level. The model formally: yᵢ = aⱼ+ bxᵢ + eᵢ. To understand how the unit effects are estimated differently by FE and RE, I again recommend Gelman and Hill (2007). One key difference between RE and FE is that for the former to be unbiased, it requires a zero correlation between unit effects aⱼ and xᵢ.

Let’s start with the required packages.


In the simulation, I assume a small true effect b that is kept constant across simulations. I vary the following four parameters: (1) the number of units (500, 1000 or 2000); (2) the number of observations per unit (2, 3 or 5); (3) the true correlation between unit effects aⱼ and xᵢ (0 or 0.8); (4) the within-unit standard deviation in xᵢ (between 0.1 and 1, while the standard deviation of xᵢ between units is fixed at 1).

design_factors <- list(
  n = c(500, 1000, 2000),
  n_perunit = c(2, 3, 5),
  truecor = c(0, 0.8),
  b_1 = c(0.2),
  x_sd = seq(0.1, 1, 0.1))

I start by defining a data-generating function. It first creates a unit-level average of xᵢ, as well as the unit effects aⱼ. To account for the skewness of browsing data, I draw the unit-level averages of xᵢ from a skewed normal distribution (note that the values of 3 for skew and 33 for kurtosis are motivated by actual browsing data). I then create values of yᵢ according to the model outlined above, adding the normally distributed error eᵢ.

generate_dat <- function(n, n_perunit, truecor, b_1, x_sd) {
 # Generate unit effects and unit-level means
 v <- unonr(n, c(0, 0),
            matrix(c(1, truecor, truecor, 1), 2, 2),
            skewness = c(0, 3), kurtosis = c(0, 33))
 b_0 <- v[, 1]
 xmean <- v[, 2]
 # Generate observations per unit
 dat <- NULL
 for (unit in 1:n) {
   x <- rnorm(n_perunit, xmean[unit], x_sd)
   y <- b_0[unit] + b_1*x + rnorm(n_perunit, 0, 1)
   dat <- rbind(dat, cbind(x, y, unit = unit))

Next, I define a function that estimates the models , and one that pulls the estimate, standard error and p-value for b. Note that I also estimate a standard “pooled” model for reference.

pull_stats <- function(model, method) {
 estimate <- broomExtra::tidy(model) %>% 
   filter(term == “x”) %>% pull(estimate)
 p.value <- broomExtra::tidy(model) %>% 
   filter(term == “x”) %>% pull(p.value)
 std.error <- broomExtra::tidy(model) %>% 
   filter(term == “x”) %>% pull(std.error)
 return(tibble(method = method, 
               est = estimate, 
               se = std.error, 
               p = p.value))
estimate <- function(dat){
# Pooled model
 reg_pool <- lm(y ~ x, data = dat)
 # Random-effects model
 reg_fe <- plm(y ~ x, data = dat, index = “unit”, model = “within”) 
 # Fixed-effects model
 reg_re <- lmerTest::lmer(y ~ x + (1|unit), data = dat)
 results <- bind_rows(
   pull_stats(reg_pool, "pooled"),
   pull_stats(reg_fe, “FE”),
   pull_stats(reg_re, “RE”))

Next, the function driving the simulation:

run_sim <- function(iterations, n, n_perunit, truecor, b_1, x_sd, seed = NULL) {
  if (!is.null(seed)) set.seed(seed)
  results <- rerun(iterations, {  
    dat <- generate_dat(n, n_perunit, truecor, b_1, x_sd)
    estimate(dat)}) %>% bind_rows()

At last, run the simulation:

params <-
  cross_df(design_factors) %>%
      iterations = 1000,
      seed = round(runif(1) * 2³⁰) + 1:n()
 results <-
   params %>%
       res = pmap(., .f = run_sim)
     ) %>%
   unnest(cols = c(res))

Now you can inspect and plot the results.

power <- results %>%
  group_by(method, n, n_perunit, truecor, x_sd, b_1) %>% 
  do(calc_rejection(., p_values = p))
power %>% %>%
  filter(truecor == 0) %>%
  rename(“Number of units” = n,
         “Observations per unit” = n_perunit) %>%
  ggplot(aes(x = x_sd, y = rej_rate, 
  group = method, color = method)) +
  geom_smooth(se = FALSE) +
  scale_y_continuous(name = “Rejection rate”) +
  scale_x_continuous(name = “Within-unit SD”) +
  scale_color_discrete(name = “Method”) +
  theme_light() + 
  theme(legend.position = “bottom”) +
  facet_wrap(~ `Number of units` + `Observations per unit`, 
             labeller = label_both)

The plot below shows that, as intuition would tell us, when the within-unit variation is small, FE models have a hard time detecting a true effect, especially when the number of units, and the number of observations per unit, are small.

[1] Gelman, A., & Hill, J. (2006). Data analysis using regression and multilevel/hierarchical models. Cambridge university press.
[2] Clark, T. S., & Linzer, D. A. (2015). Should I use fixed or random effects?. Political science research and methods, 3(2), 399-408.
[3] and
To leave a comment for the author, please follow the link and comment on their blog: BernR. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)