Discriminant Analysis of Several Groups

[This article was first published on R – Aaron Schlegel, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Discriminant analysis is also applicable in the case of more than two groups. In the first post on discriminant analysis, there was only one linear discriminant function as the number of linear discriminant functions is [latex]s = min(p, k – 1)[/latex], where [latex]p[/latex] is the number of dependent variables and [latex]k[/latex] is the number of groups. In the case of more than two groups, there will be more than one linear discriminant function, which allows us to examine the groups’ separation in more than one dimension. Discriminant analysis of several groups also makes it possible to rank the variables regarding their relative importance to group separation.

Discriminant Analysis of Several Groups

Similar to the two-group case, the goal is to find a vector [latex]a[/latex] that separates the discriminant functions [latex]\bar{z}_1, \bar{z}_2, \cdots, \bar{z}_k[/latex] at a maximum. To extend the separation criteria to the [latex]k \gt 2[/latex] case, we take the two-group equation:

[latex display=”true”] \frac{(\bar{z}_1 – \bar{z}_2)}{s_z^2} = \frac{[a'(\bar{y}_1 – \bar{y}_2)]^2}{a’S_{p1}a} [/latex]

And express it in the form:

[latex display=”true”] \frac{(\bar{z}_1 – \bar{z}_2)}{s_z^2} = \frac{a'(\bar{y}_1 – \bar{y}_2)(\bar{y}_1 – \bar{y}_2)’a}{a’S_{p1}a} [/latex]

In the discriminant analysis of several groups setting, the [latex]H[/latex] hypothesis matrix and [latex]E[/latex] error matrix from MANOVA are utilized. [latex]H[/latex] replaces [latex](\bar{y}_1 – \bar{y}_2)(\bar{y}_1 – \bar{y}_2)'[/latex] while [latex]E[/latex] replaces [latex]S_{p1}[/latex], which gives us:

[latex display=”true”] \lambda = \frac{a’Ha}{a’Ea} [/latex]

Rearranging the above yields:

[latex display=”true”] a'(Ha – \lambda Ea) = 0 [/latex]

Which can also be written as:

[latex display=”true”] (E^{-1}H – \lambda I)a = 0 [/latex]

The solutions of which are the eigenvalues [latex]\lambda_1, \lambda_2, \cdots, \lambda_s[/latex] and their corresponding eigenvectors [latex]a_1, a_2, \cdots, a_s[/latex] of the matrix [latex]E^{-1}H[/latex]. Therefore, the first and largest eigenvalue [latex]\lambda_1[/latex] and its eigenvector [latex]a_1[/latex] maximally separate the groups.

The number of eigenvalues and associated eigenvectors of [latex]E^{-1}H[/latex], [latex]s[/latex], is also the number of discriminant functions that are obtained by discriminant analysis of several groups.

Discriminant Analysis of Several Groups in R

This example will analyze the rootstock data as in the previous MANOVA post. The rootstock data were obtained from the companion FTP site of the book Methods of Multivariate Analysis by Alvin Rencher. The data contains four dependent variables as follows:

  • trunk girth at four years (mm [latex]\times[/latex] 100)
  • extension growth at four years (m)
  • trunk girth at 15 years (mm [latex]\times[/latex] 100)
  • weight of tree above ground at 15 years (lb [latex]\times[/latex] 1000)
root <- read.table('ROOT.DAT', col.names = c('Tree.Number', 'Trunk.Girth.4.Years', 'Ext.Growth.4.Years', 'Trunk.Girth.15.Years', 'Weight.Above.Ground.15.Years'))

To calculate the discriminant functions of more than two groups, the [latex]H[/latex] and [latex]E[/latex] matrices from MANOVA must be computed.

root.group <- split(root[,2:5], root$Tree.Number)
root.means <- sapply(root.group, function(x) {
  apply(x, 2, mean)
}, simplify = 'data.frame')

n <- dim(root)[1] / length(unique(root$Tree.Number))
total.means <- colMeans(root[,2:5])

H = matrix(data = 0, nrow = 4, ncol = 4)
for (i in 1:dim(H)[1]) {
  for (j in 1:i) {
    H[i,j] <- n * sum((root.means[i,] - total.means[i]) * (root.means[j,] - total.means[j]))
    H[j,i] <- n * sum((root.means[j,] - total.means[j]) * (root.means[i,] - total.means[i]))
  }
}

E = matrix(data = 0, nrow = 4, ncol = 4)
for (i in 1:dim(E)[1]) {
  for (j in 1:i) {
    b <- c() 
    for (k in root.group) {
      a <- sum((k[,i] - mean(k[,i])) * (k[,j] - mean(k[,j])))
      b <- append(b, a)
    }
    E[i,j] <- sum(b)
    E[j,i] <- sum(b)
  }
}

Then find the eigenvalues and eigenvectors of the matrix [latex]E^{-1}H[/latex].

eigens <- eigen(solve(E) %*% H)
eigens$values
## [1] 1.87567112 0.79069454 0.22904907 0.02595357


eigens$vectors
##             [,1]        [,2]       [,3]        [,4]
## [1,] -0.55337105  0.08340397 -0.1521720  0.97955535
## [2,]  0.30911038  0.08894955  0.2539187 -0.12869430
## [3,] -0.76855963 -0.52426558  0.4623164 -0.08413202
## [4,]  0.08687548  0.84277954 -0.8358424  0.12973396

Thus there are four discriminant functions. The largest eigenvalue [latex]1.8757[/latex] and its associated eigenvector [latex](-0.5534, .3091, -0.7686, 0.0868)[/latex] represent the discriminant function that maximally separates the groups.

We can see the first eigenvector is the solution to the above equation [latex]\lambda = \frac{a'Ha}{a'Ea}[/latex].

(crossprod(eigens$vectors[,1], H) %*% eigens$vectors[,1]) /  (crossprod(eigens$vectors[,1], E) %*% eigens$vectors[,1])
##          [,1]
## [1,] 1.875671

The above can also be done with the lda() function available in the MASS package.

library(MASS)

The lda() function takes a formula argument.

root.lda <- lda(Tree.Number ~ ., data = root)
root.lda
## Call:
## lda(Tree.Number ~ ., data = root)
## 
## Prior probabilities of groups:
##         1         2         3         4         5         6 
## 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 0.1666667 
## 
## Group means:
##   Trunk.Girth.4.Years Ext.Growth.4.Years Trunk.Girth.15.Years
## 1             1.13750           2.977125              3.73875
## 2             1.15750           3.109125              4.51500
## 3             1.10750           2.815250              4.45500
## 4             1.09750           2.879750              3.90625
## 5             1.08000           2.557250              4.31250
## 6             1.03625           2.214625              3.59625
##   Weight.Above.Ground.15.Years
## 1                     0.871125
## 2                     1.280500
## 3                     1.391375
## 4                     1.039000
## 5                     1.181000
## 6                     0.735000
## 
## Coefficients of linear discriminants:
##                                     LD1        LD2       LD3       LD4
## Trunk.Girth.4.Years           3.0479952  -1.140083 -1.002448 23.419063
## Ext.Growth.4.Years           -1.7025953  -1.215888  1.672714 -3.076804
## Trunk.Girth.15.Years          4.2332645   7.166403  3.045553 -2.011416
## Weight.Above.Ground.15.Years -0.4785144 -11.520302 -5.506192  3.101660
## 
## Proportion of trace:
##    LD1    LD2    LD3    LD4 
## 0.6421 0.2707 0.0784 0.0089

The output of the lda() function also shows there are four discriminant functions. The coefficients are different than what we computed; however, this is just a matter of scaling.

root.lda$scaling / eigens$vectors
##                                   LD1       LD2      LD3      LD4
## Trunk.Girth.4.Years          -5.50805 -13.66941 6.587596 23.90785
## Ext.Growth.4.Years           -5.50805 -13.66941 6.587596 23.90785
## Trunk.Girth.15.Years         -5.50805 -13.66941 6.587596 23.90785
## Weight.Above.Ground.15.Years -5.50805 -13.66941 6.587596 23.90785

Thus either set of coefficients is a solution as a multiple of an eigenvector is still the same eigenvector. The proportion of the trace output of the lda() function is the relative importance of each discriminant function.

Relative Importance of Discriminant Functions

The relative importance of each discriminant function is found by finding the associated eigenvalue’s proportion to the total sum of the eigenvalues.

[latex display="true"] \frac{\lambda_i}{\sum^s_{j=1} \lambda_i} [/latex]

for (i in eigens$values) {
  print(round(i / sum(eigens$values), 4))
}
## [1] 0.6421
## [1] 0.2707
## [1] 0.0784
## [1] 0.0089

The first and second discriminant functions account for 91% of the proportion of the total. Therefore the mean vectors lie primarily in one dimension and slightly in another dimension.

Test of Significance of Discriminant Functions

Wilks [latex]\Lambda[/latex]-test, a common MANOVA test statistic, is also employed in the discriminant analysis for several groups setting. Wilks test is defined as:

[latex display="true"] \Lambda_1 = \prod^s_{i=1} \frac{1}{1 + \lambda_i} [/latex]

Which is distributed as [latex]\Lambda_{p,k-1,N-k}[/latex]. An approximate F-test is used for each [latex]\Lambda_i[/latex].

[latex display="true"] F = \frac{1 - \Lambda_1^{(1/t)}}{\Lambda_1^{(1/t)}} \frac{df_2}{df_1} [/latex]

Where,

[latex display="true"] t = \sqrt{\frac{p^2(k-1)^2 - 4}{p^2 + (k - 1)^2 - 5}} [/latex] [latex display="true"] w = N - 1 - \frac{1}{2}(p + k) [/latex] [latex display="true"] df_1 = p(k-1) [/latex] [latex display="true"] df_2 = wt - \frac{1}{2}[p(k-1) -2] [/latex]

Denote [latex]\Lambda_m[/latex] after [latex]\Lambda_1[/latex] for each successive value as [latex]m = 2, 3, \cdots, s[/latex], where [latex]s[/latex] is the number of non-zero eigenvalues of [latex]E^{-1}H[/latex].

[latex display="true"] \Lambda_m = \prod_{i=m}^s \frac{1}{1 + \lambda_i} [/latex]

The approximate F-test becomes:

[latex display="true"] F = \frac{1 - \Lambda_m^{(1/t)}}{\Lambda_m^{(1/t)}} \frac{df_2}{df_1} [/latex]

[latex]p - m + 1[/latex] replaces [latex]p[/latex] and [latex]k - m[/latex] replaces [latex]k - 1[/latex]:

[latex display="true"] t = \sqrt{\frac{p-m+1)^2(k-m)^2-4}{(p-m+1)^2+(k-m)^2-5}} [/latex] [latex display="true"] w = N - 1 - \frac{1}{2}(p + k) [/latex] [latex display="true"] df_1 = (p - m + 1)(k - m) [/latex] [latex display="true"] df_2 = wt - \frac{1}{2}[(p - m + 1)(k - m) - 2] [/latex]

The following function implements the above to test the significance of each discriminant function. As noted previously, the first two discriminant functions represent 91% of the proportion of the total, so it is likely at least these functions will be significant.

discriminant.significance <- function(eigenvalues, p, k, N) {
  w <- N - 1 - .5 * (p + k)
  t <- sqrt((p^2 * (k - 1)^2 - 4) / (p^2 + (k - 1)^2 - 5))
  df1 <- p * (k - 1)
  df2 <- w * t - .5 * (p * (k - 1) - 2)
  lambda1 <- prod(1 / (1 + eigenvalues))
  f1 <- (1 - lambda1^(1/t)) / (lambda1^(1/t)) * df2 / df1
  p1 <- pf(f1, df1, df2, lower.tail = FALSE)
  
  result <- NULL

  for (i in 2:p) {
    m <- i
    
    if (m == p) {
      t.i <- 1
    } else {
      t.i <- sqrt(((p - m + 1)^2 * (k - m)^2 - 4) / ((p - m + 1)^2 + (k - m)^2 - 5))
    }
    
    df1.i <- (p - m + 1) * (k - m)
    df2.i <- w * t.i - .5 * ((p - m + 1) * (k - m) - 2)
    lambda.i <- prod(1 / (1 + eigenvalues[i:p]))
    f.i <- (1 - lambda.i^(1/t.i)) / lambda.i^(1/t.i) * df2.i / df1.i
    p.i <- pf(f.i, df1.i, df2.i, lower.tail = FALSE)    
    result <- rbind(result, data.frame(lambda.i, f.i, p.i))
  }
  res <- rbind(c(lambda1, f1, p1), result)
  colnames(res) <- c('Lambda', 'Approximate F', 'p-value')
  return(res)
}

N <- dim(root)[1]
p <- dim(root)[2] - 1
k <- length(unique(root$Tree.Number))

discriminant.significance(eigens$values, p, k, N)
##      Lambda Approximate F      p-value
## 1 0.1540077     4.9368880 7.713766e-09
## 2 0.4428754     3.1879149 6.382962e-04
## 3 0.7930546     1.6798943 1.363020e-01
## 4 0.9747030     0.5450251 5.838726e-01

The first two discriminant functions are indeed significant while the remaining two can be ignored.

abs(root.lda$scaling[,1:2])
##                                    LD1       LD2
## Trunk.Girth.4.Years          3.0479952  1.140083
## Ext.Growth.4.Years           1.7025953  1.215888
## Trunk.Girth.15.Years         4.2332645  7.166403
## Weight.Above.Ground.15.Years 0.4785144 11.520302

The dependent variable trunk girth at 15 years appears to separate the groups the most in both dimensions, while trunk girth at four years and weight above ground at 15 years are the most significant variables to separating the groups in their respective discriminant functions.

References

Rencher, A. (n.d.). Methods of Multivariate Analysis (2nd ed.). Brigham Young University: John Wiley & Sons, Inc.

The post Discriminant Analysis of Several Groups appeared first on Aaron Schlegel.

To leave a comment for the author, please follow the link and comment on their blog: R – Aaron Schlegel.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)