The Intercept Fallacy

[This article was first published on R – Win Vector LLC, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A common mis-understanding of linear regression and logistic regression is that the intercept is thought to encode the unconditional mean or the training data prevalence.

This is easily seen to not be the case. Consider the following example in R.

library(wrapr)

We set up our example data.

# build our example data
# modeling y as a function of x1 and x2 (plus intercept)

d <- wrapr::build_frame(
  "x1"  , "x2", "y" |
    0   , 0   , 0   |
    0   , 0   , 0   |
    0   , 1   , 1   |
    1   , 0   , 0   |
    1   , 0   , 0   |
    1   , 0   , 1   |
    1   , 1   , 0   )

knitr::kable(d)
x1x2y
000
000
011
100
100
101
110

And let’s fit a logistic regression.

m <- glm(
  y ~ x1 + x2,
  data = d,
  family = binomial())

m$coefficients

## (Intercept)          x1          x2 
##  -1.2055937  -0.3129307   1.3620590

The probability encoded in the intercept term is given as follows.

pred <- predict(
  m, 
  newdata = data.frame(x1 = 0, x2 = 0), 
  type = 'response')

pred

##         1 
## 0.2304816

Notice the prediction 0.2304816 is neither the training outcome (y) prevalence (0.2857143) nor the observed y-rate for rows that have x1, x2 = 0 (0).

The non-intercept coefficients do have an interpretation as the expected change in log-odds ratio implied by a given variable (assuming all other variables are held constant, which may not be a property of the data!).

To leave a comment for the author, please follow the link and comment on their blog: R – Win Vector LLC.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)