# Linear regression with random error giving EXACT predefined parameter estimates

January 26, 2016
By

(This article was first published on Rmazing, and kindly contributed to R-bloggers)

When simulating linear models based on some defined slope/intercept and added gaussian noise, the parameter estimates vary after least-squares fitting. Here is some code I developed that does a double transform of these models as to obtain a fitted model with EXACT defined parameter estimates a (intercept) and b (slope).

It does so by:
1) Fitting a linear model #1 $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$ to the x,y data.
2) Correcting y by $\beta_1$: $Y_i = Y_i \cdot (\mathrm{b}/\beta_1)$.
3) Refitting linear model #2: $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$.
4) Correcting y by $\beta_0$: $Y_i = Y_i + (\mathrm{a} - \beta_0)$.
5) Refitting linear model #3: $Y_i = \beta_0 + \beta_1X_i + \varepsilon_i$, which is the final model with parameter estimates a and b.

Below is the code:
 exactLM <- function( x = 1:100, ## predictor values b = 0.01, ## slope a = 3, ## intercept error = NULL, ## homoscedastic error n = 1, ## number of replicates weights = NULL, ## possible weights, i.e. when heteroscedastic plot = TRUE, ## plot data and regression ... ## other parameters to 'lm' ) { if (is.null(error)) stop("Please define some error!")

 ## create x and y-values x <- rep(x, n) y <- a + b * x if (!is.null(error) & length(error) != length(x)) stop("'x' and 'error' must be of same length!") if (!is.null(weights) & length(weights) != length(x)) stop("'x' and 'weights' must be of same length!") ## add error y <- y + error ## create linear model #1 LM1 <- lm(y ~ x, weights = weights, ...) COEF1 <- coef(LM1) ## correct slope and create linear model #2 y <- y * (b/COEF1[2]) LM2 <- lm(y ~ x, weights = weights, ...) COEF2 <- coef(LM2) ## correct intercept and create linear model #3 y <- y + (a - COEF2[1]) LM3 <- lm(y ~ x, weights = weights, ...) ## plot data and regression plot(x, y, pch = 16) abline(LM3, lwd = 2, col = "darkred") 

 return(list(model = LM3, x = x, y = y)) } 

Here are some applications using replicates and weighted fitting:
 ############ Examples ################# ## n = 1 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(100, 0, 2))

 ## n = 3 exactLM(x = 1:100, a = 0.5, b = 0.2, error = rnorm(300, 0, 2), n = 3) ## weighted by exact 1/var x <- 1:100 error <- rnorm(100, 0, 0.1 * x) weights <- 1/(0.1 * x)^2 exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights) 

## weighted by empirical 1/var x <- rep(1:100, 3) error <- rnorm(300, 0, 0.1 * x) weights <- rep(1/(tapply(error, x, mean)^2), 3) exactLM(x = x, a = 0.5, b = 0.2, error = error, weights = weights) 

I am curious on comments concerning simplification and more importantly, application (other than cheating data…)!

Cheers,
Andrej

Filed under: Uncategorized

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...