Comparing mnlogit and mlogit for discrete choice models

[This article was first published on eKonometrics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Earlier this weekend (Dec. 7, 2013), mnlogit was released on CRAN by Wang Zhiyu and Asad Hasan ([email protected]) claiming that mnlogit uses “parallel C++ library to achieve fast computation of Hessian matrices”.

Here is a comparison of mnlogit with mlogit by Yves Croissant whose package seems to be the inspiration for mnlogit.

I will estimate the same model using the same data set and will compare the two packages for execution speed, specification flexibility, and ease of use.

Data set

I use the Fish data set to estimate mnlogit and mlogit. mnlogit defines the data set as follows:

A data frame containing :

  • mode – The choice set: beach, pier, boat, and charter
  • price – price for a mode for an individual
  • catch – fish catch rate for a mode for an individual
  • income – monthly income of the individual decision-maker
  • chid – decision maker ID

The authors mention that they have sourced the data from R package mlogit by Yves Croissant, which lists the source as:

  • Herriges, J. A. and C. L. Kling (1999) “Nonlinear Income Effects in Random Utility Models”, Review of Economics and Statistics, 81, 62-72.

Estimation with mnlogit

library(mnlogit)


## Warning: package 'mnlogit' was built under R version 3.0.2


## Package: mnlogit Version: 1.0 Multinomial Logit Choice Models. Scientific
## Computing Group, Sentrana Inc, 2013.


data(Fish, package = "mnlogit")
fm <- formula(mode ~ 1 | income | price + catch)
summary(mnlogit(fm, Fish, "alt"))


## 
## Call:
## mnlogit(formula = fm, data = Fish, choiceVar = "alt")
## 
## Frequencies of alternatives in input data:
##   beach    boat charter    pier 
##   0.113   0.354   0.382   0.151 
## 
## Number of observations in training data = 1182
## Number of alternatives = 4
## Intercept turned: ON.
## Number of parameters in model = 14
##   # individual specific variables = 2
##   # choice specific coeff variables = 2
##   # generic coeff variables = 0
## 
## Maximum likelihood estimation using Newton-Raphson iterations.
##   Number of iterations: 7
##   Number of linesearch iterations: 7
## At termination: 
##   Gradient 2-norm = 8.19688645245397e-05
##   Diff between last 2 loglik values = 4.15543581766542e-08
##   Stopping reason: Succesive loglik difference < ftol (1e-06).
## Total estimation time (sec): 0.04
## Time for Hessian calculations (sec): 0.04 using 1 processors.
## 
## Coefficients : 
##                      Estimate Std.Error t-value Pr(>|t|)    
## (Intercept):boat     8.64e-01  3.15e-01    2.74  0.00607 ** 
## (Intercept):charter  1.85e+00  3.10e-01    5.97  2.4e-09 ***
## (Intercept):pier     1.13e+00  3.05e-01    3.71  0.00021 ***
## income:boat         -1.10e-04  6.02e-05   -1.84  0.06636 .  
## income:charter      -2.78e-04  6.03e-05   -4.61  4.0e-06 ***
## income:pier         -1.28e-04  5.33e-05   -2.41  0.01605 *  
## price:beach         -3.80e-02  3.33e-03  -11.41  < 2e-16 ***
## price:boat          -2.09e-02  2.23e-03   -9.33  < 2e-16 ***
## price:charter       -1.60e-02  2.02e-03   -7.94  2.0e-15 ***
## price:pier          -3.92e-02  3.26e-03  -12.02  < 2e-16 ***
## catch:beach          4.95e+00  8.20e-01    6.04  1.5e-09 ***
## catch:boat           2.47e+00  5.19e-01    4.76  1.9e-06 ***
## catch:charter        7.61e-01  1.52e-01    4.99  6.0e-07 ***
## catch:pier           4.88e+00  8.99e-01    5.43  5.5e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1160, df = 14
## AIC:  13.8875709217033


system.time((mnlogit(fm, Fish, "alt")))


##    user  system elapsed 
##    0.11    0.00    0.11


Estimation with mlogit



I estimate the same model using mlogit.



library(mlogit)


## Loading required package: Formula Loading required package: statmod
## Loading required package: lmtest Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following object is masked from 'package:base':
## 
## as.Date, as.Date.numeric
## 
## Loading required package: maxLik Loading required package: miscTools
## Loading required package: MASS


data2 <- mlogit.data(Fish, choice = "alt", shape = "long", id.var = "chid", 
    alt.levels = c("beach", "boat", "charter", "pier"))
summary(mod1 <- mlogit(mode ~ 1 | income | price + catch, data2, reflevel = "beach"))


## 
## Call:
## mlogit(formula = mode ~ 1 | income | price + catch, data = data2, 
##     reflevel = "beach", method = "nr", print.level = 0)
## 
## Frequencies of alternatives:
##   beach    boat charter    pier 
##   0.113   0.354   0.382   0.151 
## 
## nr method
## 7 iterations, 0h:0m:0s 
## g'(-H)^-1g = 8.31E-08 
## gradient close to zero 
## 
## Coefficients :
##                      Estimate Std. Error t-value Pr(>|t|)    
## boat:(intercept)     8.64e-01   3.15e-01    2.74  0.00607 ** 
## charter:(intercept)  1.85e+00   3.10e-01    5.97  2.4e-09 ***
## pier:(intercept)     1.13e+00   3.05e-01    3.71  0.00021 ***
## boat:income         -1.10e-04   6.02e-05   -1.84  0.06636 .  
## charter:income      -2.78e-04   6.03e-05   -4.61  4.0e-06 ***
## pier:income         -1.28e-04   5.33e-05   -2.41  0.01605 *  
## beach:price         -3.80e-02   3.33e-03  -11.41  < 2e-16 ***
## boat:price          -2.09e-02   2.23e-03   -9.33  < 2e-16 ***
## charter:price       -1.60e-02   2.02e-03   -7.94  2.0e-15 ***
## pier:price          -3.92e-02   3.26e-03  -12.02  < 2e-16 ***
## beach:catch          4.95e+00   8.20e-01    6.04  1.5e-09 ***
## boat:catch           2.47e+00   5.19e-01    4.76  1.9e-06 ***
## charter:catch        7.61e-01   1.52e-01    4.99  6.0e-07 ***
## pier:catch           4.88e+00   8.99e-01    5.43  5.5e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Log-Likelihood: -1160
## McFadden R^2:  0.225 
## Likelihood ratio test : chisq = 675 (p.value = <2e-16)


system.time(mlogit(mode ~ 1 | income | price + catch, data2, reflevel = "beach"))


##    user  system elapsed 
##    0.27    0.02    0.28


Speed Tests



I conducted a simple test for execution times with the command system.time. The results are reported above after each model summary.



Findings



  • I obtain identical results for the models estimated with mnlogit and mlogit.


  • Estimation speeds appear faster for mnlogit.


  • caveat: The same command (system.time) when run independently of the R Markdown environment shows mlogit to be faster than mnlogit!


  • Additional Comments: I restarted RStudio and estimated the two models again outside of R Markdown. mnlogit took  0.12 seconds versus 0.31 seconds for mlogit.


  • Verdict: mnlogit reports shorter execution times than mlogit.


  • Also, Estimation speeds may differ with large and complex data sets.


  • The model specification is simpler in mnlogit.


  • The commands for specifying the data set and the model seem easier in mnlogit.


  • mlogit syntax seems relatively complex, but offers more choices in model specification.

    Final comment



    I am delighted to see yet another package for discrete choice models. As the field evolves and more advanced problems are subjected to discrete choice models, the need for robust and computationally efficient packages will be felt more. Both mlogit and mnlogit are indeed valuable to those interested in how humans make good and bad choices!




  • To leave a comment for the author, please follow the link and comment on their blog: eKonometrics.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

    Never miss an update!
    Subscribe to R-bloggers to receive
    e-mails with the latest R posts.
    (You will not see this message again.)

    Click here to close (This popup will not appear again)