Linear regression with random error giving EXACT predefined parameter estimates

[This article was first published on Rmazing, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope).

It does so by:
1) Fitting a linear model #1 Y_i = \beta_0 + \beta_1X_i + \varepsilon_i to the x,y data.
2) Correcting y by \beta_1: Y_i = Y_i \cdot (\mathrm{b}/\beta_1).
3) Refitting linear model #2: Y_i = \beta_0 + \beta_1X_i + \varepsilon_i.
4) Correcting y by \beta_0: Y_i = Y_i + (\mathrm{a} - \beta_0).
5) Refitting linear model #3: Y_i = \beta_0 + \beta_1X_i + \varepsilon_i, which is the final model with parameter estimates a and b.

Below is the code:
exactLM <- function( x = 1:100, ## predictor values b = 0.01, ## slope a = 3, ## intercept error = NULL, ## homoscedastic error n = 1, ## number of replicates weights = NULL, ## possible weights, i.e. when heteroscedastic plot = TRUE, ## plot data and regression ... ## other parameters to 'lm' ) { if (is.null(error)) stop("Please define some error!") ## create x and y-values x <- rep(x, n) y <- a + b * x if (!is.null(error) & length(error) != length(x)) stop("'x' and 'error' must be of same length!") if (!is.null(weights) & length(weights) != length(x)) stop("'x' and 'weights' must be of same length!") ## add error y <- y + error ## create linear model #1 LM1 <- lm(y ~ x, weights = weights, ...) COEF1 <- coef(LM1) ## correct slope and create linear model #2 y <- y * (b/COEF1[2]) LM2 <- lm(y ~ x, weights = weights, ...) COEF2 <- coef(LM2) ## correct intercept and create linear model #3 y <- y + (a - COEF2[1]) LM3 <- lm(y ~ x, weights = weights, ...) ## plot data and regression plot(x, y, pch = 16) abline(LM3, lwd = 2, col = "darkred") return(list(model = LM3, x = x, y = y)) }

Here are some applications using replicates and weighted fitting:
############ Examples ################# ## n = 1 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(100, 0, 2)) ## n = 3 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(300, 0, 2), n = 3) ## weighted by exact 1/var x <- 1:100 error <- rnorm(100, 0, 0.1 * x) weights <- 1/(0.1 * x)^2 exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights) ## weighted by empirical 1/var x <- rep(1:100, 3) error <- rnorm(300, 0, 0.1 * x) weights <- rep(1/(tapply(error, x, mean)^2), 3) exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights)

I am curious on comments concerning simplification and more importantly, application (other than cheating data…)!

Cheers,
Andrej


Filed under: Uncategorized

To leave a comment for the author, please follow the link and comment on their blog: Rmazing.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)