Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

We will start, in our Data Science course,  to discuss classification techniques (in the context of supervised models). Consider the following case, with 10 points, and two classes (red and blue)

```> clr1 <- c(rgb(1,0,0,1),rgb(0,0,1,1))
> clr2 <- c(rgb(1,0,0,.2),rgb(0,0,1,.2))
> x <- c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
> y <- c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
> z <- c(1,1,1,1,1,0,0,1,0,0)
> df <- data.frame(x,y,z)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])```

To get a prediction, i.e. a partition of the space in two parts, consider some logistic regression

```> reg=glm(z~x+y,data=df,family=binomial)
> summary(reg)

Call:
glm(formula = z ~ x + y, family = binomial, data = df)

Deviance Residuals:
Min       1Q   Median       3Q      Max
-1.6593  -0.4400   0.2564   0.5830   1.5374

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept)   -1.706      1.999  -0.854    0.393
x             -5.489      5.360  -1.024    0.306
y              8.568      5.515   1.554    0.120

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 13.4602  on 9  degrees of freedom
Residual deviance:  8.1445  on 7  degrees of freedom
AIC: 14.144

Number of Fisher Scoring iterations: 5```

Given some point, the predicted class is obtained using

```> pred_1 <- function(x,y){
+ predict(reg,newdata=data.frame(x=x,
+ y=y),type="response")>.5
+ }```

(here, the predicted class is simply the one that is the most likely). To visualize it use

```> x_grid<-seq(0,1,length=101)
> y_grid<-seq(0,1,length=101)
> z_grid <- outer(x_grid,y_grid,pred_1)
> image(x_grid,y_grid,z_grid,col=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])
```

Since the logistic regression is a (generalized) linear model, the line that separate the two regions is a straight line.

What if we had three classes, and not two (red, yellow and blue)

```> clr1=c(rgb(1,0,0,1),rgb(1,1,0,1),rgb(0,0,1,1))
> clr2=c(rgb(1,0,0,.2),rgb(1,1,0,.2),
+ rgb(0,0,1,.2))
> x=c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
> y=c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
> z=c(1,2,2,2,1,0,0,1,0,0)
> df=data.frame(x,y,z)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])```

We should use here a multinomial model. Recall that for the logistic regression model, if $(\pi,1-\pi)=(\pi_1,\pi_2)$, then

$\log \frac{\pi}{1-\pi}=\log \frac{\pi_1}{\pi_2} =\boldsymbol{X}'\boldsymbol{\beta}$

i.e.

$\pi_1 = \frac{\exp(\boldsymbol{X}'\boldsymbol{\beta})}{1+\exp(\boldsymbol{X}'\boldsymbol{\beta})}$

while

$\pi_2 = \frac{1}{1+\exp(\boldsymbol{X}'\boldsymbol{\beta})}$

To derive a multivariate extension, write

$\pi_1 = \frac{\exp(\boldsymbol{X}'\boldsymbol{\beta}_1)}{1+\exp(\boldsymbol{X}'\boldsymbol{\beta}_1)+\exp(\boldsymbol{X}'\boldsymbol{\beta}_2)}$

$\pi_2 = \frac{\exp(\boldsymbol{X}'\boldsymbol{\beta}_2)}{1+\exp(\boldsymbol{X}'\boldsymbol{\beta}_1)+\exp(\boldsymbol{X}'\boldsymbol{\beta}_2)}$

and

$\pi_3 = \frac{1}{1+\exp(\boldsymbol{X}'\boldsymbol{\beta}_1)+\exp(\boldsymbol{X}'\boldsymbol{\beta}_2)}$

Here, maximum likelihood techniques can be used, since

$\mathcal{L}(\boldsymbol{\pi},\boldsymbol{y})\propto \prod_{i=1}^n \prod_{j=1}^3 \pi_{i,j}^{Y_{i,j}}$

where variable $Y_{i}$  – which take three levels – is splitted in three indicators (like any categorical explanatory variables in standard regression model). The code to estimate that model is simply

```> library(nnet)
> reg=multinom(z~x+y,data=df)
# weights:  12 (6 variable)
initial  value 10.986123
iter  10 value 0.794930
iter  20 value 0.065712
iter  30 value 0.064409
iter  40 value 0.061612
iter  50 value 0.058756
iter  60 value 0.056225
iter  70 value 0.055332
iter  80 value 0.052887
iter  90 value 0.050644
iter 100 value 0.048117
final  value 0.048117
stopped after 100 iterations```

and the output is

```> summary(reg)
Call:
multinom(formula = z ~ x + y, data = df)

Coefficients:
(Intercept)          x         y
1    14.44215 -126.90076  47.68488
2  -114.49676   65.23419 100.31341

Std. Errors:
(Intercept)        x        y
1    23.03561 144.8639 52.99049
2    50.54085 267.1013 98.60386

Residual Deviance: 0.09623369
AIC: 12.09623```

In the binomial case, there was only one interesting case (once we have the probability to be equal to 1, we have the probability to be null). Here, there are two. So we have two lines, for the three parameters. The prediction function, for the class, is here

```> pred_2 <- function(x,y){
+ which.max(predict(reg,
+ newdata=data.frame(x=x,y=y),type="probs"))
+ }```

Now, to visualize predictions, use

```> x_grid<-seq(0,1,length=101)
> y_grid<-seq(0,1,length=101)
> z_grid <- outer(x_grid,y_grid,pred_2)
Error in outer(x_grid, y_grid, pred_2) :
dims [produit 10201] do not match the length of object [1]```

Unfortunately, it is not working. Neither is

```> z_grid <- outer(x_grid,y_grid,function(x,y) pred_2(x,y))
Error in outer(x_grid, y_grid, function(x, y) pred_2(x, y)) :
dims [produit 10201] do not match the length of object [1]```

If any one has a suggestion, I’d be glad to know. Just because we really want to get our graph, use

```> Outer <- function(x,y,fun) {
+ mat <- matrix(NA, length(x), length(y))
+ for (i in seq_along(x)) {
+ for (j in seq_along(y)) mat[i,j]=fun(x[i],y[j])
+ }
+ return(mat)}
> z_grid <- Outer(x_grid,y_grid,pred_2)```

(which is slow, but at least, it’s running). The prediction is

```> image(x_grid,y_grid,z_grid,col=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])```

We now have three regions, the frontier being linear, and the intersection being the equiprobable case.