📊 Multinomial regression in R

January 23, 2018

(This article was first published on Iegor Rudnytskyi, and kindly contributed to R-bloggers)

In my current project on Long-term care at some point we were required to use a regression model with multinomial responses. I was very surprised that in contrast to well-covered binomial GLM for binary response case, multinomial case is poorly described. Surely, there are half-dozen packages overlapping each other, however, there is no sound tutorial or vignette. Hopefully, my post will improve the current state.

We can distinguish two types of multinominal responses, namely nominal and ordinal. For nominal response a variable can possess a value from predefined finite set and these values are not ordered. For instance a variable color can be either green or blue or green. In machine learning the problem is often referred to as a classification. In contrast to nominal case, for ordinal repose variable the set of values has the relative ordering. For example, a variable size can be small < middle < large. Furthermore, depending on a link function we can have logit or probit models.

Nominal response models

According to Agresti (2002) we can the problem can be formulated by two similar approaches: through baseline-category logits or multivariate GLM. In general, these two approaches are equivalent with identical maximum-likelihood estimates, the only thing which is different is the formula representation.

Baseline-category logits (multinomial logit model)

The baseline-category logits is implemented as a function in three distinct packages, namely nnet::multinom() (referred as to log-linear model), mlogit::mlogit, mnlogit::mnlogit (claims to be more efficient implementation than mlogit, see comparison of perfomances of these packages).

Let $p_j = \mathbb{P}(Y = j \mid \boldsymbol{x})$ is a probability of dependent variable $Y$ to have value $j$ given a vector of explanatory variables’ values $\boldsymbol{x}$. In total, there are $J$ categories, and obviously, due to second axiom of probability $\sum_j p_j = 1$. We fix a baseline category at level $J$ (or at any other level), and the model is as follows:

describing the effects of explanatory $\boldsymbol{x}$ on logits of odds between a level $j$ and baseline level. Of course, using these $J-1$ equations and the second axiom it’s possible to come back to probabilities (which is a nice exercise, by the way):

For each group $j$ the set of parameters $\alpha_j$ and $\boldsymbol{\beta}_j$ are distinct. Let’s now estimate those $\alpha_j, \quad \boldsymbol{\beta}_j, \quad j = 1, …, J – 1$ by different packages and make sure that estimates are identical. I use marital.nz data from VGAM package.

# install.packages("VGAM")
#   age ethnicity            mstatus
# 1  29  European             Single
# 2  55  European  Married/Partnered
# 3  44  European  Married/Partnered
# 4  53  European Divorced/Separated
# 5  45  European  Married/Partnered
# 7  30  European             Single
# [1] Single             Married/Partnered  Divorced/Separated Widowed           
# Levels: Divorced/Separated Married/Partnered Single Widowed

The data contains “marital data mainly from a large NZ company collected in the early 1990s”. Dependent variable mstatus has four unordered classes Divorced/Separated, Married/Partnered, Single, and Widowed. We use age as the only exploratory variable.

  • Package nnet
fit_nnet <- multinom(mstatus ~ age, marital.nz)
#                   (Intercept)          age
# Married/Partnered    2.778686 -0.003538729
# Single               6.368064 -0.152745520
# Widowed             -6.753123  0.099333903
  • Package mlogit
fit_mlogit <- mlogit(mstatus ~ 0 | age, data = marital.nz, shape = "wide")
matrix(fit_mlogit$coefficients, ncol = 2)
#           [,1]         [,2]
# [1,]  2.778666 -0.003538297
# [2,]  6.368056 -0.152745424
# [3,] -6.753157  0.099334560
  • Package mnlogit
marital.nz_long <- mlogit.data(data = marital.nz, choice = "mstatus")
fit_mnlogit <- mnlogit(mstatus ~ 1 | age | 1, marital.nz_long)
matrix(fit_mnlogit$coefficients, ncol = 2, byrow = TRUE)
#           [,1]         [,2]
# [1,]  2.778666 -0.003538297
# [2,]  6.368056 -0.152745424
# [3,] -6.753157  0.099334560

Even though the latter package is very efficient and customizable, there are several points I am not a big fan of. First off, mnlogit works only with long data instead of common and familiar for regression wide. That’s why we had to use mlogit.data to convert the data. Second, the formula’s syntax is too confusing despite its customizability. Of course, the list is not exhaustive, other packages exists, e.g. brglm2.

Multinomial logit model as multivariate GLM

For this model instead of treating the response variable as a scalar we set to be a vector of $J-1$ elements ($J$-th is redundant). Then, $\boldsymbol{y_i} = (y_{i,1}, …, y_{i, J-1})’$ and $\boldsymbol{\mu_i} = (p_{i,1}, …, p_{i, J-1})’$. Therefore,


where $\boldsymbol{g}$ is a vector of link functions.

The package vgam deals exactly with cases of multivariate GLM and GAM. Let’s compute estimates for this model, which should coincide with previously calculated ones:

fit_vgam <- vglm(mstatus ~ age, multinomial(refLevel = 1), 
                 data = marital.nz)
matrix(fit_vgam@coefficients, ncol = 2)
#           [,1]         [,2]
# [1,]  2.778666 -0.003538297
# [2,]  6.368056 -0.152745424
# [3,] -6.753157  0.099334560

Ordinal response model: proportional odds model

For ordinal response variable the model is slightly different. Let $Y$ be a categorical response variable with $J$ categories which are ordered $1<…

Then, cumulative logits are:

Let’s now define the cumulative logits and exploratory variables $\boldsymbol{x}$:

Note that $\boldsymbol{\beta}$ are the same for each logit. However, intercepts can be different and necessarily are non-decreasing.

The model got its name from its property:

Again, there are at least four packages, which calibrate the proportional odds model. Let’s quickly compare those estimates using Italian household data for 2006 dataset ecb06it from VGAMdata package. We try to explain ordinal variable education of 8 levels by numeric age.

# install.packages("VGAMdata")
# str(ecb06.it)
head(ecb06.it[, c("age", "education")])
#    age     education
# 1   58    highschool
# 4   81 primaryschool
# 5   52    highschool
# 9   67  middleschool
# 12  56  middleschool
# 16  72 primaryschool
  • Package MASS

Perhaps the most famous function is MASS::polr.

fit_polr <- polr(formula = education ~ age, data = ecb06.it)
summary(fit_polr)$coefficients[, 1, drop = FALSE]
#                                  Value
# age                        -0.06417893
# none|primaryschool         -6.95688936
# primaryschool|middleschool -4.51869196
# middleschool|profschool    -3.06471919
# profschool|highschool      -2.73295822
# highschool|bachelors       -0.96907401
# bachelors|masters          -0.89517059
# masters|higherdegree        2.42815131
  • Package VGAM
fit_vglm <- vglm(formula = education ~ age, family = propodds, data = ecb06.it)
#                      [,1]
# (Intercept):1  6.95576156
# (Intercept):2  4.51825182
# (Intercept):3  3.06430069
# (Intercept):4  2.73254206
# (Intercept):5  0.96867493
# (Intercept):6  0.89470432
# (Intercept):7 -2.42867591
# age           -0.06417086
  • Package ordinal
fit_clm <- clm(formula = education ~ age, data = ecb06.it)
#                                  [,1]
# none|primaryschool         -6.9557784
# primaryschool|middleschool -4.5182645
# middleschool|profschool    -3.0643131
# profschool|highschool      -2.7325541
# highschool|bachelors       -0.9686858
# bachelors|masters          -0.8947152
# masters|higherdegree        2.4286635
# age                        -0.0641711

Nice thing about this package is that it allows for using different link functions, i.e. "logit", "probit", "cloglog", "loglog", and "cauchit". To my regret I know only "logit" and "probit" from this list.

  • Package rms
fit_lrm <- lrm(formula = education ~ age, data = ecb06.it)
#                        [,1]
# y>=primaryschool  6.9557784
# y>=middleschool   4.5182645
# y>=profschool     3.0643131
# y>=highschool     2.7325541
# y>=bachelors      0.9686858
# y>=masters        0.8947152
# y>=higherdegree  -2.4286635
# age              -0.0641711

This function was rather unstable. Adding more exploratory variable have thrown an error a couple of times.

Coefficients are consistent (difference in signs are explained by $\mathbb{P}(Y \leq j)$ and $\mathbb{P}(Y \geq j)$), which is good.

Perhaps, now you have a question which package to use? Well, I do not know, just choose one and stick to it. I will use probably VGAM, as long as it covers various models and seems like nicely documented.


  • Agresti, A. (2002) Categorical Data, Second edition, Wiley
  • STAT504

To leave a comment for the author, please follow the link and comment on their blog: Iegor Rudnytskyi.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)