**yaRb**, and kindly contributed to R-bloggers)

Last week I talked about our **editrules** package[1,2] at the *useR!2011* conference. In the coming time I plan to write a short series of blogs about the functionality of **editrules**. Below I describe the **eliminate** and **isFeasible** functions. But first: a bit of theory.

Most people with some linear algebra background will remember the Gaussian elimination procedure for manipulating linear systems of equations of the form **Ax**=**b**, with **A** a real *m x n * matrix, **x** an unknown *n*-vector and **b** an *m*-vector. Gaussian elimination is based on manipulating the rows of augmented matrix [**A**|**b**] with three basic operations: permutation of rows, addition of rows, and multiplication of rows by a non-zero constant. The solution set {**x**} is invariant under these operations, and by repeatedly applying them, it is possible to transform the system to reduced row echelon form, for example.

It is (perhaps) a bit less known that similar manipulations are possible for systems of *in*equalities. Consider a system of inequalities, written as **Ax**<**b**. Now, addition of rows and permuting rows of [**A**|**b**] is still possible. However, multiplying a row with a negative constant gives a problem, since this implies changing the comparison operator < of that row to >. Rows with different comparison operators cannot be added. So, to manipulate a system of inequalities, we restrict our basic operations to permutation of rows, addition of rows and multiplication with a *positive* constant. These operations allow us to eliminate a variable (say *x*_{1}) from a system of inequalities in the following way

- Permute the rows of [
**A|b**] such that the top*m*_{1}rows have positive coefficients in the first column, the middle*m*_{2}rows have negative coefficients and the bottom*m*_{3}rows have 0 coefficient in the first column. - Create a new (
*m*_{1}*m*_{2}+*m*_{3}) x (*n*+1) augmented matrix in the following way.- Copy the bottom
*m*_{3}rows to the new system. - Combine each row in 1…
*m*_{1}with each row in*m*_{1}+1…*m*_{2}by addition, multiplying the row with the negative coefficient with a positive constant such that the resulting rows have the first coefficient equal to zero. Copy all these rows to the new system.

- Copy the bottom

The new system is equivalent to the original one, in the sense that it has a solution if and only if the original system has a solution (A consequence of Farkas’ lemma).

At first glance, it would seem that the second step involves a double loop over the sets of rows with positive and negative coefficients. However, using the indexing and recycling properties of **R**, explicit loops can be avoided completelely. Here is a simple piece of code demonstrating this. We will define an augmented matrix and eliminate the 4th variable.

# create an augmented matrix.

set.seed(1)

Ab <- matrix(sample(-9:10),nrow=4,dimnames=list(NULL,c("x1","x2","x3","x4","b")))

### We will eliminate x4 from this matrix.

# Normalize, so all nonzero coefficients x4 have -1 or 1 (second statement uses recyling).

Izero <- Ab[,"x4"] == 0

Ab[!Izero] <- Ab[!Izero, ]/abs(Ab[!Izero,"x4"])

Ab

x1 x2 x3 x4 b

[1,] -0.4 -0.6000000 0.900000 1 0.2

[2,] -2.0 4.0000000 -9.000000 0 7.0

[3,] 0.2 1.0000000 -1.400000 -1 1.6

[4,] 2.0 -0.3333333 -2.666667 -1 1.0

# get positive and negative rows:

Ipos <- which(Ab[,"x4"] > 0)

Ineg <- which(Ab[,"x4"] < 0)

# Here's the real trick: use smart indexing to derive new rows:

I1 <- rep(Ipos, times=length(Ineg))

I2 <- rep(Ineg, each=length(Ipos))

# avoid looping!

Ab2 <- Ab[I1,] + Ab[I2,]

Ab2 <- rbind(Ab2, Ab[Izero,])

# and here's the result

Ab2

x1 x2 x3 x4 b

[1,] -0.2 0.4000000 -0.500000 0 1.8

[2,] 1.6 -0.9333333 -1.766667 0 1.2

[3,] -2.0 4.0000000 -9.000000 0 7.0

Created by Pretty R at inside-R.org

The code above contains the basic methods applied in the **eliminate** function of the **editrules** package. The **eliminate** function has some important extras however: it can recognize numerical near zero’s as zero, and it handles mixed systems of equalities and inequalities (operators may be any of <, ≤, ==, >, ≥). For example, let’s put some requirements on simple records, consisting of variables cost, turnover and profit:

> library(editrules)

E <- editmatrix(c(

+ "cost + turnover == profit",

+ "profit < 0.6*turnover"))

E

Edit matrix:

cost profit turnover Ops CONSTANT

e1 1 -1 1.0 == 0

e2 0 1 -0.6 < 0

Edit rules:

e1 : cost + turnover == profit

e2 : profit < 0.6*turnover

eliminate(E,"profit")

Edit matrix:

cost profit turnover Ops CONSTANT

e1 1 0 0.4 < 0

Edit rules:

e1 : cost + 0.4*turnover < 0

Created by Pretty R at inside-R.org

So, by eliminating a the variable “profit”, we’ve derived an expression relating cost and turnover. (In a previous post, I already showed how the rules can be read from text.)

As you might expect, the number of new rules can grow exponentially when multiple variables are eliminated one by one. In fact, if the original system has *m* rows, the system obtained after eliminating a single variable has maximally *m*^{2}/4 rows. It can be shown however, that after eliminating *k* variables, all rows derived from more than *k+1* original rows are redundant[3]. The **eliminate** function uses this rule to eliminate most of the redundant rows. To use this facility, you have to overwrite the original editmatrix each time a variable is eliminated, to retain the derivation history. The reduction can be quite dramatic, as the following example shows:

# Create a random 10 x 11 system of linear inequalities Ax<b.

set.seed(1)

E <- as.editmatrix(A=matrix(rnorm(100),nrow=10),b=rnorm(10),ops=rep("<",10))

# variables to eliminate

elim <- c("x1","x2","x3")

# eliminate some variables, removing history

F <- E

for ( v in elim){

+ F <- eliminate(F,v)

+ # remove some internals, usually hidden for user

+ attr(F,"H") <- NULL

+ attr(F,"h") <- NULL

}

# nr of rows, NOT retaining history, after eliminating 3 variables

nrow(F)

[1] 1276

# eliminate some variables, retaining history:

G <- E

for ( v in elim ) G <- eliminate(G,v)

# nr of rows, retaining history, after eliminating 3 variables

nrow(G)

[1] 29

Created by Pretty R at inside-R.org

So we’ve managed to avoid 1247 redundant rows out of 1276 in total.

Ok, so we’ve eliminated a variable. What’s the use of all this? Well, first of all, if and only if our system happens to be infeasible, it can be shown that by FM-elimination, a contradictory row equivalent to to 0 < -1 will always pop up when eliminating variables one by one. In fact, the function **isFeasible** of the **editrules** package will do that for you:

> isFeasible(E)

[1] TRUE

Created by Pretty R at inside-R.org

So in this case there is at least one vector (cost, profit, turnover) obeying the rules in E. These functions may be of use when you work with hundreds of variables and hundreds of restrictions (perhaps growing in time) and you want to know if it all makes sense.

The elimination procedure is an important subroutine of the error localization functionality of the **editrules** package, about which I will blog another time.

[1] Edwin de Jonge and Mark van der Loo (2011). *Manipulation of linear edits and error localization with the editrules package*, Discussion paper 11XXX Statistics Netherlands, The Hague/Heerlen (*in press*). See also the package vignette

[2] The **editrules** package is designed to easily define, manipulate and check rules that your data has to obey. The current version (1.0-2) offers functionality for handling numerical equalities and inequalities. In the coming time we will work to get the prototype version for categorical publication-ready.

[3] See e.g. Williams (1986) *Fourier’s method of linear programming and it’s dual.* (pdf) The American Mathematical Monthly 93, 681

**leave a comment**for the author, please follow the link and comment on his blog:

**yaRb**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...