[This article was first published on R on datascienceblog.net: R for Data Science, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Interpreting generalized linear models (GLM) obtained through glm is similar to interpreting conventional linear models. Here, we will discuss the differences that need to be considered.

## Basics of GLMs

GLMs enable the use of linear models in cases where the response variable has an error distribution that is non-normal. Each distribution is associated with a specific canonical link function. A link function $$g(x)$$ fulfills $$X \beta = g(\mu)$$. For example, for a Poisson distribution, the canonical link function is $$g(\mu) = \text{ln}(\mu)$$. Estimates on the original scale can be obtained by taking the inverse of the link function, in this case, the exponential function: $$\mu = \exp(X \beta)$$.

## Data preparation

We will take 70% of the airquality samples for training and 30% for testing:

data(airquality)
ozone <- subset(na.omit(airquality),
select = c("Ozone", "Solar.R", "Wind", "Temp"))
set.seed(123)
N.train <- ceiling(0.7 * nrow(ozone))
N.test <- nrow(ozone) - N.train
trainset <- sample(seq_len(nrow(ozone)), N.train)
testset <- setdiff(seq_len(nrow(ozone)), trainset)

## Training a GLM

For investigating the characteristics of GLMs, we will train a model, which assumes that errors are Poisson distributed.

By specifying family = "poisson", glm automatically selects the appropriate canonical link function, which is the logarithm. More information on possible families and their canonical link functions can be obtained via ?family.

model.pois <- glm(Ozone ~ Solar.R + Temp + Wind, data = ozone,
family = "poisson", subset = trainset)
summary(model.pois)
##
## Call:
## glm(formula = Ozone ~ Solar.R + Temp + Wind, family = "poisson",
##     data = ozone, subset = trainset)
##
## Deviance Residuals:
##     Min       1Q   Median       3Q      Max
## -5.3137  -2.0913  -0.6163   1.7141   6.0361
##
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)
## (Intercept)  0.343052   0.228505   1.501    0.133
## Solar.R      0.001887   0.000251   7.519 5.52e-14 ***
## Temp         0.045357   0.002491  18.210  < 2e-16 ***
## Wind        -0.072509   0.006728 -10.777  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
##     Null deviance: 1464.87  on 77  degrees of freedom
## Residual deviance:  453.25  on 74  degrees of freedom
## AIC: 863.48
##
## Number of Fisher Scoring iterations: 4

In terms of the GLM summary output, there are the following differences to the output obtained from the lm summary function:

• Deviance (deviance of residuals / null deviance / residual deviance)
• Other outputs: dispersion parameter, AIC, Fisher Scoring iterations

Moreover, the prediction function of GLMs is also a bit different. We will start with investigating the deviance.

## Deviance residuals

We already know residuals from the lm function. But what are deviance residuals? In ordinary least-squares, the residual associated with the $$i$$-th observation is defined as

$r_i = y_i - \hat{f}(x_i)$

where $$\hat{f}(x) = \beta_0 + x^T \beta$$ is the prediction function of the fitted model.

For GLMs, there are several ways for specifying residuals. To understand deviance residuals, it is worthwhile to look at the other types of residuals first. For this, we define a few variables first:

expected <- ozone$Ozone[trainset] g <- family(model.pois)$linkfun # log function
g.inv <- family(model.pois)$linkinv # exp function estimates.log <- model.pois$linear.predictors # estimates on log scale
estimates <- fitted(model.pois) # estimates on response scale (exponentiated)
all.equal(g.inv(estimates.log), estimates)
##  TRUE

We will cover four types of residuals: response residuals, working residuals, Pearson residuals, and, deviance residuals. There is also another type of residual called partial residual, which is formed by determining residuals from models where individual features are excluded. This residual is not discussed here.

### Response residuals

For type = "response", the conventional residual on the response level is computed, that is, $r_i = y_i - \hat{f}(x_i)\,.$ This means that the fitted residuals are transformed by taking the inverse of the link function:

# type = "response"
res.response1 <- residuals(model.pois, type = "response")
res.response2 <- expected - estimates
all.equal(res.response1, res.response2)
##  TRUE

### Working residuals

For type = "working", the residuals are normalized by the estimates $$\hat{f}(x_i)$$:

$r_i = \frac{y_i - \hat{f}(x_i)}{\hat{f}(x_i)}\,.$

# type = "working"
res.working1 <- residuals(model.pois, type="working")
res.working2 <- (expected - estimates) / estimates
all.equal(res.working1, res.working2)
##  TRUE

### Pearson residuals

For type = "pearson", the Pearson residuals are computed. They are obtained by normalizing the residuals by the square root of the estimate:

$r_i = \frac{y_i - \hat{f}(x_i)}{\sqrt{\hat{f}(x_i)}}\,.$

# type = "pearson"
res.pearson1 <- residuals(model.pois, type="pearson")
res.pearson2 <- (expected - estimates) / sqrt(estimates)
all.equal(res.pearson1, res.pearson2)
##  TRUE

### Deviance residuals

Deviance residuals are defined by the deviance. The deviance of a model is given by

${D(y,{\hat {\mu }})=2{\Big (}\log {\big (}p(y\mid {\hat {\theta }}_{s}){\big )}-\log {\big (}p(y\mid {\hat {\theta }}_{0}){\big )}{\Big )}.\,}$

where

• $$y$$ is the outcome
• $$\hat{\mu}$$ is the estimate of the model
• $$\hat{\theta}_s$$ and $$\hat{\theta}_0$$ are the parameters of the fitted saturated and proposed models, respectively. A saturated model has as many parameters as it has training points, that is, $$p = n$$. Thus, it has a perfect fit. The proposed model can be the any other model.
• $$p(y | \theta)$$ is the likelihood of data given the model

The deviance indicates the extent to which the likelihood of the saturated model exceeds the likelihood of the proposed model. If the proposed model has a good fit, the deviance will be small. If the proposed model has a bad fit, the deviance will be high. For example, for the Poisson model, the deviance is

$D = 2 \cdot \sum_{i = 1}^n y_i \cdot \log \left(\frac{y_i}{\hat{\mu}_i}\right) − (y_i − \hat{\mu}_i)\,.$

In R, the deviance residuals represent the contributions of individual samples to the deviance $$D$$. More specifically, they are defined as the signed square roots of the unit deviances. Thus, the deviance residuals are analogous to the conventional residuals: when they are squared, we obtain the sum of squares that we use for assessing the fit of the model. However, while the sum of squares is the residual sum of squares for linear models, for GLMs, this is the deviance.

How does such a deviance look like in practice? For example, for the Poisson distribution, the deviance residuals are defined as:

$r_i = \text{sgn}(y - \hat{\mu}_i) \cdot \sqrt{2 \cdot y_i \cdot \log \left(\frac{y_i}{\hat{\mu}_i}\right) − (y_i − \hat{\mu}_i)}\,.$

Let us verify this in R:

# type = "deviance"
res.dev1 <- residuals(model.pois, type = "deviance")
res.dev2 <- residuals(model.pois)
poisson.dev <- function (y, mu)
# unit deviance
2 * (y * log(ifelse(y == 0, 1, y/mu)) - (y - mu))
res.dev3 <- sqrt(poisson.dev(expected, estimates)) *
ifelse(expected > estimates, 1, -1)
all.equal(res.dev1, res.dev2, res.dev3)
##  TRUE

Note that, for ordinary least-squares models, the deviance residual is identical to the conventional residual.

### Deviance residuals in practice

We can obtain the deviance residuals of our model using the residuals function:

summary(residuals(model.pois))
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
## -5.3137 -2.0913 -0.6163 -0.1642  1.7141  6.0361

Since the median deviance residual is close to zero, this means that our model is not biased in one direction (i.e. the out come is neither over- nor underestimated).

## Null and residual deviance

Since we have already introduced the deviance, understanding the null and residual deviance is not a challenge anymore. Let us repeat the definition of the deviance once again:

${D(y,{\hat {\mu }})=2{\Big (}\log {\big (}p(y\mid {\hat {\theta }}_{s}){\big )}-\log {\big (}p(y\mid {\hat {\theta }}_{0}){\big )}{\Big )}.\,}$

The null and residual deviance differ in $$\theta_0$$:

• Null deviance: $$\theta_0$$ refers to the null model (i.e. an intercept-only model)
• Residual deviance: $$\theta_0$$ refers to the trained model

How can we interpret these two quantities?

• Null deviance: A low null deviance implies that the data can be modeled well merely using the intercept. If the null deviance is low, you should consider using few features for modeling the data.
• Residual deviance: A low residual deviance implies that the model you have trained is appropriate. Congratulations!

### Null deviance and residual deviance in practice

Let us investigate the null and residual deviance of our model:

paste0(c("Null deviance: ", "Residual deviance: "),
ilink <- family(model.pois)$linkinv # inverse link function: exp for Poisson all.equal(ilink(pred.l), pred.r) ##  TRUE all.equal(pred.l, link(pred.r)) ##  TRUE There is also the type = "terms" setting but this one is rarely used an also available in predict.lm. ### Obtaining confidence intervals The predict function of GLMs does not support the output of confidence intervals via interval = "confidence" as for predict.lm. We can still obtain confidence intervals for predictions by accessing the standard errors of the fit by predicting with se.fit = TRUE: predict.confidence <- function(object, newdata, level = 0.95, ...) { if (!is(object, "glm")) { stop("Model should be a glm") } if (!is(newdata, "data.frame")) { stop("Plase input a data frame for newdata") } if (!is.numeric(level) | level < 0 | level > 1) { stop("level should be numeric and between 0 and 1") } ilink <- family(object)$linkinv
ci.factor <- qnorm(1 - (1 - level)/2)
# calculate CIs:
fit <- predict(object, newdata = newdata, level = level,
type = "link", se.fit = TRUE, ...)
lwr <- ilink(fit$fit - ci.factor * fit$se.fit)
upr <- ilink(fit$fit + ci.factor * fit$se.fit)
df <- data.frame("fit" = ilink(fit\$fit), "lwr" = lwr, "upr" = upr)
return(df)
}

Using this function, we get the following confidence intervals for the Poisson model:

conf.df <- predict.confidence(model.pois, ozone[testset,])
##        fit      lwr       upr
## 8 9.073256 8.141969 10.111065
## 9 5.409964 4.619623  6.335518

Using the confidence data, we can create a function for plotting the confidence of the estimates in relation to individual features:

plot.confidence <- function(df, feature) {
library(ggplot2)
p <- ggplot(df, aes_string(x = feature,
y = "fit")) +
geom_line(colour = "blue") +
geom_point() +
geom_ribbon(aes(ymin = lwr, ymax = upr),
alpha = 0.5)
return(p)
}
plot.confidence.features <- function(data, features) {
plots <- list()
for (feature in features) {
p <- plot.confidence(data, feature)
plots[[feature]] <- p
}
library(gridExtra)
#grid.arrange(plots[], plots[], plots[])
do.call(grid.arrange, plots)
}

Using these functions, we can generate the following plot:

data <- cbind(ozone[testset,], conf.df)
plot.confidence.features(data, colnames(ozone)) ## Where to go from here?

Having covered the fundamentals of GLMs, you may want to dive deeper into their practical application by taking a look at this post where I investigate different types of GLMs for improving the prediction of ozone levels.