Prototyping Multinomial Logit with R

August 21, 2013
By

(This article was first published on Yet Another Blog in Statistical Computing » S+/R, and kindly contributed to R-bloggers)

Recently, I am working on a new modeling proposal based on the competing risk and need to prototype multinomial logit models with R. There are R packages implementing multinomial logit models that I’ve tested, namely nnet and vgam. Model outputs with iris data are shown below.

data(iris)

### method 1: nnet package ###
library(nnet)
mdl1 <- multinom(Species ~ Sepal.Length, data = iris, model = TRUE)
summary(mdl1)

# Coefficients:
#            (Intercept) Sepal.Length
# versicolor   -26.08339     4.816072
# virginica    -38.76786     6.847957
#
# Std. Errors:
#            (Intercept) Sepal.Length
# versicolor    4.889635    0.9069211
# virginica     5.691596    1.0223867

### method 2: vgam package ### 
library(VGAM)
mdl2 <- vglm(Species ~ Sepal.Length, data = iris, multinomial(refLevel = 1)) 
summary(mdl2)

# Coefficients:
#                Estimate Std. Error z value
# (Intercept):1  -26.0819    4.88924 -5.3346
# (Intercept):2  -38.7590    5.69064 -6.8110
# Sepal.Length:1   4.8157    0.90683  5.3105
# Sepal.Length:2   6.8464    1.02222  6.6976

However, in my view, above methods are not flexible for real-world problems. For instance, there is no off-shelf solution for the variable selection for above multinomial logit models. Instead of building one multinomial logit model, we might develop two separate binomial logit models to accomplish the same task.

### method 3: two binary logit models ### 
iris$y <- ifelse(iris$Species == 'setosa', 0, 1)
mdl31 <- glm(y ~ Sepal.Length, data = iris, subset = (Species != 'virginica'), family = binomial)
summary(mdl31)

#  Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept)   -27.831      5.434  -5.122 3.02e-07 ***
# Sepal.Length    5.140      1.007   5.107 3.28e-07 ***

mdl32 <- glm(y ~ Sepal.Length, data = iris, subset = (Species != 'versicolor'), family = binomial)
summary(mdl32)

# Coefficients:
#              Estimate Std. Error z value Pr(>|z|)    
# (Intercept)   -38.547      9.557  -4.033 5.50e-05 ***
# Sepal.Length    6.805      1.694   4.016 5.91e-05 ***

As shown above, we can get a set of similar estimated parameters by the third approach with much simpler models.

To leave a comment for the author, please follow the link and comment on their blog: Yet Another Blog in Statistical Computing » S+/R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Dommino data lab

Quantide: statistical consulting and training



http://www.eoda.de







ODSC

ODSC

CRC R books series





Six Sigma Online Training





Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)