Testing for Linear Separability with Linear Programming in R

April 19, 2014
By

(This article was first published on joy of data » R, and kindly contributed to R-bloggers)

lin-sep-points-2d-10For the previous article I needed a quick way to figure out if two sets of points are linearly separable. But for crying out loud I could not find a simple and efficient implementation for this task. Except for the perceptron and SVM – both are sub-optimal when you just want to test for linear separability. The perceptron is guaranteed to finish off with a happy ending – if feasible – but it can take quite a while. And SVMs are designed for soft-margin classification which means that they might settle for a not separating plane and also they maximize the margin around the separating plane – which is wasted computational effort.

The efficient way to get the job done is by applying linear programming (LP). That means representing the question “Is it possible to fit a hyper-plane between two sets of points?” with a number of inequalities (that make up a convex area). I’m going to give a quick walk through for the math to make the idea plausible – but this text is more describing an introductory example and not an introduction to LP itself. For solving the linear program I will use Rglpk which provides a high level interface to the GNU Linear Programming Kit (GLPK) – and of course has been co-crafted by the man himself – Kurt Hornik – who is also involved with kernlab and party – thank you, Prof. Hornik and keep up the good work!

The Gory Details

Let’s say we have two sets A and B of points in \mathbb{R}^M:

A = \{a_1, ..., a_{N_1}\} \subset \mathbb{R}^M, B = \{b_1, ..., b_{N_2}\} \subset \mathbb{R}^M

And we want to know if there is a hyper-plane in \mathbb{R}^M which separates A and B then we can formulate the necessary condition with two symmetrical inequalities:

A \beta \in \mathbb{R} and an h \in \mathbb{R}^M exist, such that we can say for all a \in A: h^Ta > \beta (1) and for all b \in B: h^Tb < \beta (2).

This is because a hyper-plane in \mathbb{R}^M can be defined as a vector
(h,\beta) \in \mathbb{R}^{M} \times \mathbb{R} and the points on either side of it can be distinguished with above stated inequalities.

N <- 100
g <- expand.grid(x=0:N/N,y=0:N/N)

# definition of the hyper plane
h1 <- 3
h2 <- -4
beta  <- -1.3

# points on either side
g$col <- ifelse(h1 * g$x + h2 * g$y > beta, 
     "cornflowerblue","darkolivegreen3")

# roughly on the hyper plane
g$col <- ifelse(abs(h1 * g$x + h2 * g$y - beta) < 2/N, "red", g$col)

plot(g$x, g$y, col=g$col, pch=16, cex=.5, xlab="x", ylab="y", 
     main="h(x,y) = 3 * x + (-4) * y + 1.3 = 0")

lin-sep-2dThe conditions of a linear program are usually stated as a number of “weakly smaller than” inequalities. So lets transform (1) and (2) appropriately:

The conditions h^Ta > \beta and h^Tb < \beta can be written as h^Ta \geq \beta + 1 and h^Tb \leq \beta - 1. This is because we are dealing with finite sets A and B, so if we have a separating plane, then we can always fit in an \epsilon such that  h^Ta \geq \beta + \epsilon and h^Tb \leq \beta - \epsilon. If we now multiply both inequalities with 1/\epsilon, then we just end up with a different formulation (h / \epsilon, \beta / \epsilon) for the same plane (h, \beta). The first inequality we additionally multiply with -1 to turn \geq into \leq – and we and now we have:

(3) -h^Ta \leq -\beta - 1

(4) h^Tb \leq \beta - 1

Okay great – we’re almost there – now let’s get all the variables on the left hand side:

(5) -h^Ta + \beta \leq -1

(6) h^Tb - \beta \leq -1

These hyper-plane conditions have to be true for all points. All points in A have to fulfil (5) and all points in B have to fulfil (6). Then this set of N_1 + N_2 inequalities describes the convex set in \mathbb{R}^{M+1} of all possible separating hyper planes (h,\beta). Usually the description and purpose of a linear program does not stop at this point and the set of feasible solutions is used to maximize an objective function. In our case such an objective function might be introduced to maximize the distance of the plane from the points. But the article is long enough already and our objective is to just find a plane and not the the best plane. Which is why our objective function is going to be simply the 0 \in \mathbb{R}^{M+1}

So now we formulate (5) and (6) in matrix notation because this is how LP solvers expect the program description to be fed to them – we get:

A\cdot\tilde{h} \leq b with N := N_1 + N_2

A = \begin{pmatrix} -a_1^T & 1 \\ ... & ... \\ -a_{N_1}^T & 1 \\ b_1^T & -1 \\ ... & ... \\ b_{N_2}^T & -1 \end{pmatrix} \in \mathbb{R}^{N\times(M+1)}, \tilde{h} = \begin{pmatrix} h_1 \\ ... \\ h_M \\ \beta \end{pmatrix} \in \mathbb{R}^{M+1}, b = \begin{pmatrix} -1 \\ ... \\ -1 \\ -1 \\ ... \\ -1 \end{pmatrix} \in \mathbb{R}^{N}

Example with Rglpk in two dimensions

lin-sep-points-2d lin-sep-points-2d-5 lin-sep-points-2d-10

library(Rglpk)

N1 <- 3
N2 <- 3

# the points of sets A and B
P <- matrix(
  runif(2*N1 + 2*N2,0,1),
  ncol=2,byrow=T
)

# the matrix A defining the lhs of the conditions
A <- cbind(P * c(rep(-1,N1),rep(1,N2)), c(rep(1,N1),rep(-1,N2)))

# the objective function - no optimization necessary
obj <- c(0,0,0)

# the vector b defining the rhs of the conditions
b <- rep(-1, N1+N2)

# by default GLPK assums positive boundaries for the
# variables. but we need the full set of real numbers.
bounds <- list(
  lower = list(ind = c(1,2,3), val = c(-Inf,-Inf,-Inf)),
  upper = list(ind = c(1,2,3), val = c(Inf,Inf,Inf))
)

# solving the linear program
s <- Rglpk_solve_LP(obj, A, rep("<=", N1+N2), b, bounds=bounds)

plot(P,col=c(rep("red",N1),rep("blue",N2)), xlab="x", ylab="y", 
     cex=1, pch=16, xlim=c(0,1), ylim=c(0,1))

# status 0 means that a solution was found
if(s$status == 0) {
  h1 = s$solution[1]
  h2 = s$solution[2]
  beta  = s$solution[3]
  
  # drawing the separating line
  if(h2 != 0) {
    abline(beta/h2,-h1/h2)
  } else {
    abline(v=-beta/h1)
  }
} else {
  cat("Not linearly separable.")
}


In case you are wondering how I managed to include all those pretty pretty math formulas in this post – I am using the QuickLaTeX WordPress plug-in and I must say I really like the result. In previous posts I used a LaTeX web editor and then included the rendered formulas as an image.

 

The post Testing for Linear Separability with Linear Programming in R appeared first on joy of data.

To leave a comment for the author, please follow the link and comment on his blog: joy of data » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.