Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another popular technique for classification (or at least, which used to be popular) is the (linear) discriminant analysis, introduced by Ronald Fisher in 1936. Consider the same dataset as in our previous post

```> clr1 <- c(rgb(1,0,0,1),rgb(0,0,1,1))
> x <- c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
> y <- c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
> z <- c(1,1,1,1,1,0,0,1,0,0)
> df <- data.frame(x,y,z)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])```

The main interest of that technique is not the output, but more the fact that we can make here simple (and explicit) computations. Especially to get a better understanding of theoretical concepts on classification.

The code to run a linear discriminent analysis is

```> library(MASS)
> fit_lda <- lda (z ~ x+y, data=df)
> fit_lda
Call:
lda(z ~ x + y, data = df)

Prior probabilities of groups:
0   1
0.4 0.6

Group means:
x      y
0 0.4750000 0.3625
1 0.4583333 0.6950

Coefficients of linear discriminants:
LD1
x -2.588390
y  4.762615```

To visualize the prediction, consider

```> pred_lda <- function(x,y) as.numeric(predict(fit_lda,
+       newdata=data.frame(x=x,y=y))\$class)
> x_grid<-seq(0,1,length=101)
> y_grid<-seq(0,1,length=101)
> z_grid <- outer(x_grid,y_grid,pred_lda)
> clr1=c(rgb(1,0,0,1),rgb(0,0,1,1))
> clr2=c(rgb(1,0,0,.2),rgb(0,0,1,.2))
> image(x_grid,y_grid,z_grid,col=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])
```

Note that it is very close to what we got with our logistic regression. So, what did we do, here ? The idea, here, is that

$\boldsymbol{X}\vert Y=0\sim\mathcal{N}(\boldsymbol{\mu}_0,\boldsymbol{\Sigma})$

and

$\boldsymbol{X}\vert Y=1\sim\mathcal{N}(\boldsymbol{\mu}_1,\boldsymbol{\Sigma})$

Using Bayes’ theorem, it is possible thoses relationship, to compute the probability that $Y=1$ (for instance), given $\boldsymbol{X}$. Here, we can easily estimate the three paramers, since means are just centroids of the groups

$\widehat{\boldsymbol{\mu}}_k=\frac{1}{\text{card}\{i,Y_i=k\}}\sum_{i,Y_i=k} \boldsymbol{X}_i$

and

$\widehat{\boldsymbol{\Sigma}}_k=\frac{1}{n}\left(\sum_{i,Y_i=0} (\boldsymbol{X}_i-\widehat{\boldsymbol{\mu}}_0)(\boldsymbol{X}_i-\widehat{\boldsymbol{\mu}}_0)^{\text{\sffamily T}}+\sum_{i,Y_i=1} (\boldsymbol{X}_i-\widehat{\boldsymbol{\mu}}_1)(\boldsymbol{X}_i-\widehat{\boldsymbol{\mu}}_1)^{\text{\sffamily T}}\right)$

```> m0=fit_lda\$means[1,]
> m1=fit_lda\$means[2,]
> df\$x0=df\$x-m0[1]
> df\$y0=df\$y-m0[2]
> df\$x0[z==1]=df\$x[z==1]-m1[1]
> df\$y0[z==1]=df\$y[z==1]-m1[2]
> S=var(df[,c("x0","y0")])
> library(mnormt)
> d0 <- function(x,y) dmnorm(cbind(x,y),mean=m0,varcov=S)
> z_0 <- outer(x_grid,y_grid,d0)
> d1 <- function(x,y) dmnorm(cbind(x,y),mean=m1,varcov=S)
> z_1 <- outer(x_grid,y_grid,d1)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])
> points(fit_lda\$means,col=clr1,pch=4,lwd=2)```

To visualize the two regions, just compare the densities. For instance, if the ratio of the two densities exceeds one, then we should be in one class,

`> contour(x_grid,y_grid,z_1/z_0,levels=1)`

But if we want to be consistent with the Bayesian interpretation, we should not compare the ratios of probabilities with one, but with the ratio of the proportions of points in each categories, since

$\mathbb{P}(Y=y\vert \boldsymbol{X}=\boldsymbol{x})\propto\varphi_{\boldsymbol{\mu}_k,\boldsymbol{\Sigma}}(\boldsymbol{x})\cdot \mathbb{P}(Y=y)$

Thus,

$\mathbb{P}(Y=1\vert \boldsymbol{X}=\boldsymbol{x})=\mathbb{P}(Y=0\vert \boldsymbol{X}=\boldsymbol{x})$

if and only if

$\frac{\varphi_{\boldsymbol{\mu}_1,\boldsymbol{\Sigma}}(\boldsymbol{x})}{\varphi_{\boldsymbol{\mu}_0,\boldsymbol{\Sigma}}(\boldsymbol{x})}=\frac{\mathbb{P}(Y=0)}{\mathbb{P}(Y=1)}$

`> contour(x_grid,y_grid,z_1/z_0,levels=.4/.6)`

(which is exacty the graph we got above). Consider now the case where we have three clases (as in the previous post)

```> clr1=c(rgb(1,0,0,1),rgb(1,1,0,1),rgb(0,0,1,1))
> clr2=c(rgb(1,0,0,.2),rgb(1,1,0,.2),rgb(0,0,1,.2))
> x=c(.4,.55,.65,.9,.1,.35,.5,.15,.2,.85)
> y=c(.85,.95,.8,.87,.5,.55,.5,.2,.1,.3)
> z=c(1,2,2,2,1,0,0,1,0,0)
> df=data.frame(x,y,z)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])```

The code to run a linear discriminent analysis is the same,

```> fit_lda <- lda (z ~ x+y, data=df)
> fit_lda
Call:
lda(z ~ x + y, data = df)

Prior probabilities of groups:
0   1   2
0.4 0.3 0.3

Group means:
x         y
0 0.4750000 0.3625000
1 0.2166667 0.5166667
2 0.7000000 0.8733333

Coefficients of linear discriminants:
LD1       LD2
x 2.014510  4.237266
y 3.438693 -3.185543```

To visualise the three classes use

```> pred_lda <- function(x,y)
+ as.numeric(predict(fit_lda,
+ newdata=data.frame(x=x,y=y))\$class)
> z_grid <- outer(x_grid,y_grid,pred_lda)
> image(x_grid,y_grid,z_grid,col=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])```

Here again, we have the interpretation of a mixture of three Gaussian vectors (with given means, and identical variance)

```> m0=fit_lda\$means[1,]
> m1=fit_lda\$means[2,]
> m2=fit_lda\$means[3,]
> df\$x0=df\$x-m0[1]
> df\$y0=df\$y-m0[2]
> df\$x0[z==1]=df\$x[z==1]-m1[1]
> df\$y0[z==1]=df\$y[z==1]-m1[2]
> df\$x0[z==2]=df\$x[z==2]-m2[1]
> df\$y0[z==2]=df\$y[z==2]-m2[2]
> S=var(df[,c("x0","y0")])
> library(mnormt)
> d0 <- function(x,y) dmnorm(cbind(x,y),mean=m0,varcov=S)
> z_0 <- outer(x_grid,y_grid,d0)
> d1 <- function(x,y) dmnorm(cbind(x,y),mean=m1,varcov=S)
> z_1 <- outer(x_grid,y_grid,d1)
> d2 <- function(x,y) dmnorm(cbind(x,y),mean=m2,varcov=S)
> z_2 <- outer(x_grid,y_grid,d2)
> plot(x,y,pch=19,cex=2,col=clr1[z+1],)
> points(fit_lda\$means,col=clr1,pch=4,lwd=2)```

Again, to visualize lines where two densities are equal, we use

```> contour(x_grid,y_grid,z_1/z_0,levels=1)
> contour(x_grid,y_grid,z_2/z_0,levels=1)
> contour(x_grid,y_grid,z_1/z_2,levels=1)```

or, using Bayes theorem to get an interpretation in terms of conditional probabilities

```> contour(x_grid,y_grid,z_1/z_0,
> contour(x_grid,y_grid,z_2/z_0,
> contour(x_grid,y_grid,z_1/z_2,

Now, this linear discriminent analysis is very restrictive. We must have the same variance. If not, we get what is called the quadratic discriminent analysis. The code is here

`> fit_qda <- qda (z ~ x+y, data=df)`

Again, to visualize the classes, we use

```> pred_qda <- function(x,y) as.numeric(predict(fit_qda, newdata=data.frame(x=x,y=y))\$class)
> x_grid<-seq(0,1,length=101)
> y_grid<-seq(0,1,length=101)
> z_grid <- outer(x_grid,y_grid,pred_qda)
> clr1=c(rgb(1,0,0,1),rgb(0,0,1,1))
> clr2=c(rgb(1,0,0,.2),rgb(0,0,1,.2))
> image(x_grid,y_grid,z_grid,xcol=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])
> points(x,y,pch=1,cex=2)
```

Here again, the idea is that we have a mixture of two Gaussian vectors (with different variances, this time), i.e.

$\boldsymbol{X}\vert Y=0\sim\mathcal{N}(\boldsymbol{\mu}_0,\boldsymbol{\Sigma}_0)$

while

$\boldsymbol{X}\vert Y=1\sim\mathcal{N}(\boldsymbol{\mu}_1,\boldsymbol{\Sigma}_1)$

```> m0=fit_qda\$means[1,]
> m1=fit_qda\$means[2,]
> S0=var(df[z==0,c("x","y")])
> S1=var(df[z==1,c("x","y")])
> library(mnormt)
> d0 <- function(x,y) dmnorm(cbind(x,y),mean=m0,varcov=S0)
> z_0 <- outer(x_grid,y_grid,d0)
> d1 <- function(x,y) dmnorm(cbind(x,y),mean=m1,varcov=S1)
> z_1 <- outer(x_grid,y_grid,d1)
> plot(x,y,pch=19,cex=2,col=clr1[z+1],)
> points(fit_lda\$means,col=clr1,pch=4,lwd=2)```

Here they are,

To get again the separation curve, we use

`> contour(x_grid,y_grid,z_1/z_0,levels=.4/.6)`

The decision boudary, i.e.

$\{\boldsymbol{x}:\mathbb{P}(Y=0\vert\boldsymbol{X}=\boldsymbol{x})=\mathbb{P}(Y=1\vert\boldsymbol{X}=\boldsymbol{x})\}$

$\mathbb{P}(Y=y\vert\boldsymbol{X}=\boldsymbol{x})=-\frac{1}{2}\log\vert\boldsymbol{\Sigma}_y\vert \\ -\frac{1}{2}[\boldsymbol{x}-\boldsymbol{\mu}_y]^{\text{\sffamily{T}}}\boldsymbol{\Sigma}_y^{-1}[\boldsymbol{x}-\boldsymbol{\mu}_y]+\log\mathbb{P}(Y=y)$

What if we focus on the three classes case ? The classes are

```> z_grid <- outer(x_grid,y_grid,pred_qda)
> image(x_grid,y_grid,z_grid,col=clr2)
> points(x,y,pch=19,cex=2,col=clr1[z+1])
> points(x,y,pch=1,cex=2)```

The idea behind is the same as in the linear case, we simply relax the assumption of identical variance

```> m0=fit_qda\$means[1,]
> m1=fit_qda\$means[2,]
> m2=fit_qda\$means[3,]
> S0=var(df[z==0,c("x","y")])
> S1=var(df[z==1,c("x","y")])
> S2=var(df[z==2,c("x","y")])
> library(mnormt)
> d0 <- function(x,y) dmnorm(cbind(x,y),mean=m0,varcov=S0)
> z_0 <- outer(x_grid,y_grid,d0)
> d1 <- function(x,y) dmnorm(cbind(x,y),mean=m1,varcov=S1)
> z_1 <- outer(x_grid,y_grid,d1)
> d2 <- function(x,y) dmnorm(cbind(x,y),mean=m2,varcov=S2)
> z_2 <- outer(x_grid,y_grid,d2)
> plot(x,y,pch=19,cex=2,col=clr1[z+1])
> points(fit_lda\$means,col=clr1,pch=4,lwd=2)
> contour(x_grid,y_grid,z_1/z_0,levels=1)
> contour(x_grid,y_grid,z_2/z_0,levels=1)
> contour(x_grid,y_grid,z_1/z_2,levels=1)```

to compare densities, but here we do not have the conditional probability interpretation.

Note that there is also a mixture discriminant analysis

```> library(mda)
> fit_mda <- mda (z ~ x+y, data=df)
> pred_mda <- function(x,y)
+ as.numeric(predict(fit_mda,
+ newdata=data.frame(x=x,y=y),type="class"))```

for two classes, but also in the case where we had three classes,