Supervised Classification, Logistic and Multinomial

Posted on March 2, 2015 by arthur charpentier in R bloggers | 0 Comments

[This article was first published on Freakonometrics » R-english, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We will start, in our Data Science course, to discuss classification techniques (in the context of supervised models). Consider the following case, with 10 points, and two classes (red and blue)

> clr1 <- c(rgb(1,0,0,1),rgb(0,0,1,1))
> clr2 <- c(rgb(1,0,0,.2),rgb(0,0,1,.2))
> x <- c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
> y <- c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
> z <- c(1,1,1,1,1,0,0,1,0,0)
> df <- data.frame(x,y,z)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])

To get a prediction, i.e. a partition of the space in two parts, consider some logistic regression

> reg=glm(z~x+y,data=df,family=binomial)
> summary(reg)
 
Call:
glm(formula = z ~ x + y, family = binomial, data = df)
 
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-1.6593  -0.4400   0.2564   0.5830   1.5374  
 
Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   -1.706      1.999  -0.854    0.393
x             -5.489      5.360  -1.024    0.306
y              8.568      5.515   1.554    0.120
 
(Dispersion parameter for binomial family taken to be 1)
 
    Null deviance: 13.4602  on 9  degrees of freedom
Residual deviance:  8.1445  on 7  degrees of freedom
AIC: 14.144
 
Number of Fisher Scoring iterations: 5

Given some point, the predicted class is obtained using

> pred_1 <- function(x,y){
+ predict(reg,newdata=data.frame(x=x,
+ y=y),type="response")>.5
+ }

(here, the predicted class is simply the one that is the most likely). To visualize it use

> x_grid<-seq(0,1,length=101)
> y_grid<-seq(0,1,length=101)
> z_grid <- outer(x_grid,y_grid,pred_1)
> image(x_grid,y_grid,z_grid,col=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])

Since the logistic regression is a (generalized) linear model, the line that separate the two regions is a straight line.

What if we had three classes, and not two (red, yellow and blue)

> clr1=c(rgb(1,0,0,1),rgb(1,1,0,1),rgb(0,0,1,1))
> clr2=c(rgb(1,0,0,.2),rgb(1,1,0,.2),
+ rgb(0,0,1,.2))
> x=c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
> y=c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
> z=c(1,2,2,2,1,0,0,1,0,0)
> df=data.frame(x,y,z)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])

We should use here a multinomial model. Recall that for the logistic regression model, if $http://latex.codecogs.com/gif.latex?(pi,1-pi)=(pi_1,pi_2)$ , then

$http://latex.codecogs.com/gif.latex?log%20frac{pi}{1-pi}=log%20frac{pi_1}{pi_2}%20=boldsymbol{X}%27boldsymbol{beta}$

i.e.

$http://latex.codecogs.com/gif.latex?pi_1%20=%20frac{exp(boldsymbol{X}%27boldsymbol{beta})}{1+exp(boldsymbol{X}%27boldsymbol{beta})}$

while

$http://latex.codecogs.com/gif.latex?pi_2%20=%20frac{1}{1+exp(boldsymbol{X}%27boldsymbol{beta})}$

To derive a multivariate extension, write

$http://latex.codecogs.com/gif.latex?pi_1%20=%20frac{exp(boldsymbol{X}%27boldsymbol{beta}_1)}{1+exp(boldsymbol{X}%27boldsymbol{beta}_1)+exp(boldsymbol{X}%27boldsymbol{beta}_2)}$

$http://latex.codecogs.com/gif.latex?pi_2%20=%20frac{exp(boldsymbol{X}%27boldsymbol{beta}_2)}{1+exp(boldsymbol{X}%27boldsymbol{beta}_1)+exp(boldsymbol{X}%27boldsymbol{beta}_2)}$

and

$http://latex.codecogs.com/gif.latex?pi_3%20=%20frac{1}{1+exp(boldsymbol{X}%27boldsymbol{beta}_1)+exp(boldsymbol{X}%27boldsymbol{beta}_2)}$

Here, maximum likelihood techniques can be used, since

$http://latex.codecogs.com/gif.latex?mathcal{L}(boldsymbol{pi},boldsymbol{y})propto%20prod_{i=1}^n%20prod_{j=1}^3%20pi_{i,j}^{Y_{i,j}}$

where variable $http://latex.codecogs.com/gif.latex?Y_{i}$ – which take three levels – is splitted in three indicators (like any categorical explanatory variables in standard regression model). The code to estimate that model is simply

> library(nnet)
> reg=multinom(z~x+y,data=df)
# weights:  12 (6 variable)
initial  value 10.986123 
iter  10 value 0.794930
iter  20 value 0.065712
iter  30 value 0.064409
iter  40 value 0.061612
iter  50 value 0.058756
iter  60 value 0.056225
iter  70 value 0.055332
iter  80 value 0.052887
iter  90 value 0.050644
iter 100 value 0.048117
final  value 0.048117 
stopped after 100 iterations

and the output is

> summary(reg)
Call:
multinom(formula = z ~ x + y, data = df)
 
Coefficients:
  (Intercept)          x         y
1    14.44215 -126.90076  47.68488
2  -114.49676   65.23419 100.31341
 
Std. Errors:
  (Intercept)        x        y
1    23.03561 144.8639 52.99049
2    50.54085 267.1013 98.60386
 
Residual Deviance: 0.09623369 
AIC: 12.09623

In the binomial case, there was only one interesting case (once we have the probability to be equal to 1, we have the probability to be null). Here, there are two. So we have two lines, for the three parameters. The prediction function, for the class, is here

> pred_2 <- function(x,y){
+ which.max(predict(reg,
+ newdata=data.frame(x=x,y=y),type="probs"))
+ }

Now, to visualize predictions, use

> x_grid<-seq(0,1,length=101)
> y_grid<-seq(0,1,length=101)
> z_grid <- outer(x_grid,y_grid,pred_2)
Error in outer(x_grid, y_grid, pred_2) : 
  dims [produit 10201] do not match the length of object [1]

Unfortunately, it is not working. Neither is

> z_grid <- outer(x_grid,y_grid,function(x,y) pred_2(x,y))
Error in outer(x_grid, y_grid, function(x, y) pred_2(x, y)) : 
  dims [produit 10201] do not match the length of object [1]

If any one has a suggestion, I’d be glad to know. Just because we really want to get our graph, use

> Outer <- function(x,y,fun) {
+ mat <- matrix(NA, length(x), length(y))
+ for (i in seq_along(x)) {
+ for (j in seq_along(y)) mat[i,j]=fun(x[i],y[j])
+ }
+ return(mat)}
> z_grid <- Outer(x_grid,y_grid,pred_2)

(which is slow, but at least, it’s running). The prediction is

> image(x_grid,y_grid,z_grid,col=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])

We now have three regions, the frontier being linear, and the intersection being the equiprobable case.

To leave a comment for the author, please follow the link and comment on their blog: Freakonometrics » R-english.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Supervised Classification, Logistic and Multinomial

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)