Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Consider two time series
where
By iteration of Equation 2, we see that
This does not mean that, given observational data
It is fairly easy to work out the consequences of autocorrelation. Suppose, more generally, that the error term
which is unbiased since
Similarly, the variance-covariance matrix of the OLS
Even though the variance estimators are themselves biased, the biases could still vanish in the asymptotic limit. This is the case for
For
Illustration
The (foldable) block below defines helpers to simulate the results of linear regression on data generated according to
library(dplyr) library(ggplot2) rxy_fun <- function(rx, f, reps) { res <- function(n) { x <- rx(n) # X has marginal distribution 'rx' y <- f(x) + reps(x) # Y has conditional mean 'f(x)' and noise 'reps(x)' return(tibble(x = x, y = y)) } return(structure(res, class = "rxy")) } plot.rxy <- function(x, N = 1000, seed = 840) { set.seed(seed) ggplot(data = x(N), aes(x = x, y = y)) + geom_point(alpha = 0.3) + geom_smooth(method = "lm", se = FALSE) } lmsim <- function(rxy, N = 100, vcov = stats::vcov, B = 1e3, seed = 840) { set.seed(seed) res <- list( coef = matrix(nrow = B, ncol = 2), vcov = vector("list", B), sigma2 = numeric(B) ) colnames(res$coef) <- c("(Intercept)", "x") class(res) <- "lmsim" for (b in 1:B) { .fit <- lm(y ~ ., data = rxy(N)) res$coef[b, ] <- coef(.fit) # Store intercept and slope in B x 2 matrix res$vcov[[b]] <- vcov(.fit) # Store vcov estimates in length B list. res$sigma2[[b]] <- sigma(.fit) ^ 2 } return(res) } print.lmsim <- function(x) { cat("Simulation results:\n\n") cat("* Model-trusting noise variance:\n ") print( mean(x$sigma2) ) cat("* Model-trusting vcov of coefficient estimates:\n") print( avg_est_vcov <- Reduce("+", x$vcov) / length(x$vcov) ) cat("\n* Simulation-based vcov of coefficient estimates:\n") print( emp_vcov <- cov(x$coef)) cat("\n* Ratio (Model-trusting / Simulation):\n") print( avg_est_vcov / emp_vcov ) return(invisible(x)) }
We simulate linear regression on data generated according to:
rxy_01 <- rxy_fun( rx = \(n) 1 + arima.sim(list(order = c(1,0,0), ar = 0.4), n = n), f = \(x) 1 + x, reps = \(x) arima.sim( list(order = c(1,0,0), ar = 1/sqrt(2)), n = length(x) ) ) plot(rxy_01)
From the simulation below, we see that with lmsim()
below).
lmsim(rxy_01, N = 100)
Simulation results: * Model-trusting noise variance: [1] 1.870606 * Model-trusting vcov of coefficient estimates: (Intercept) x (Intercept) 0.03583159 -0.01663739 x -0.01663739 0.01659708 * Simulation-based vcov of coefficient estimates: (Intercept) x (Intercept) 0.15486131 -0.02665435 x -0.02665435 0.02978162 * Ratio (Model-trusting / Simulation): (Intercept) x (Intercept) 0.2313786 0.6241905 x 0.6241905 0.5572928
To correctly estimate sandwich::vcovHAC()
6. It turns out that, even with a relatively simple example like the present one, the sample size required for the HAC estimator’s bias to die out is unreasonably large (see below). With such large samples, one can probably obtain much better results by leaving out some data for model building, performing inference on the remaining data with a proper time-series model.
lmsim(rxy_01, vcov = sandwich::vcovHAC, N = 100)
Simulation results: * Model-trusting noise variance: [1] 1.870606 * Model-trusting vcov of coefficient estimates: (Intercept) x (Intercept) 0.08787795 -0.02323242 x -0.02323242 0.02339146 * Simulation-based vcov of coefficient estimates: (Intercept) x (Intercept) 0.15486131 -0.02665435 x -0.02665435 0.02978162 * Ratio (Model-trusting / Simulation): (Intercept) x (Intercept) 0.5674623 0.8716182 x 0.8716182 0.7854329
lmsim(rxy_01, vcov = sandwich::vcovHAC, N = 500)
Simulation results: * Model-trusting noise variance: [1] 1.974131 * Model-trusting vcov of coefficient estimates: (Intercept) x (Intercept) 0.023032270 -0.005723968 x -0.005723968 0.005684149 * Simulation-based vcov of coefficient estimates: (Intercept) x (Intercept) 0.029600757 -0.005993161 x -0.005993161 0.006152216 * Ratio (Model-trusting / Simulation): (Intercept) x (Intercept) 0.7780973 0.9550834 x 0.9550834 0.9239189
lmsim(rxy_01, vcov = sandwich::vcovHAC, N = 1000)
Simulation results: * Model-trusting noise variance: [1] 1.98033 * Model-trusting vcov of coefficient estimates: (Intercept) x (Intercept) 0.011878079 -0.002771484 x -0.002771484 0.002849089 * Simulation-based vcov of coefficient estimates: (Intercept) x (Intercept) 0.015085291 -0.002844716 x -0.002844716 0.002855522 * Ratio (Model-trusting / Simulation): (Intercept) x (Intercept) 0.7873948 0.9742566 x 0.9742566 0.9977471
Footnotes
For the linear model assumptions to hold, the
pairs should come from independent realizations of the same time series, which is of course not the type of data we are usually presented with.↩︎As usual we stack observations vertically in the
and matrices.↩︎For an extreme case, suppose that
(no covariate except for an intercept term), and let the noise term be , where and are independent -scores. One can easily see that, in this setting, and .↩︎For instance, for the
noise of Eq. Equation 2, we have .↩︎The difference
decays as , which is of the same order of the estimation target . Not sure I’m using standard terminology here.↩︎Disclaimer: I haven’t read any theory about the HAC estimator, so I may be misusing it here, but I would have expected it to work relatively well on such an “easy” example. For illustrations on how to use sandwich estimators for first- and second-order linear model misspecification, you can read this post of mine.↩︎
Reuse
< section class="quarto-appendix-contents" id="quarto-citation">Citation
@online{gherardi2023, author = {Gherardi, Valerio}, title = {Linear Regression with Autocorrelated Noise}, date = {2023-05-25}, url = {https://vgherard.github.io/posts/2023-05-20-linear-regression-with-autocorrelated-noise/}, langid = {en} }
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.