**Mark van der Loo**, and kindly contributed to R-bloggers)

We (Edwin de Jonge and me) have recently updated our editrules package. The most important new features include (beta) support for categorical data. However, in this

post I'm going to show some visualizations we included, made possible by Gabor Csardi's awesome igraph package.

Make sure you run

^{?}Download download.txt

1 | install.packages(c('igraph','editrules')) |

before trying the code below.

First, let's load editrules' built-in editset:

^{?}Download download.txt

1 2 3 4 5 6 7 8 9 10 11 12 | > data(edits) > edits name edit description 1 b1 t == ct + p total balance 2 b2 ct == ch + cp cost balance 3 s1 p <= 0.6*t profit sanity 4 s2 cp <= 0.3*t personnel cost sanity 5 s3 ch <= 0.3*t housing cost sanity 6 p1 t >0 turnover positivity 7 p2 ch > 0 housing cost positivity 8 p3 cp > 0 personnel cost positivity 9 p4 ct > 0 total cost positivity |

Here, **edits** is a **data.frame** with a "name" column, naming the rules, an "edit" column, with a character representation of the edit rules and a "description" column. The variables have the following meaning: *t*: turnover, *ct* total cost, *p* profit, *ch* housing costs and *cp* personnel cost. The rules demand balance accounts to add up (*e.g.*cost + profit equals turnover) and demand some sanity checks (*e.g.* profit can not exceed 60% of turnover).

The sanity checks here are completely fictional. To do anything useful with these rules, turn them into an **editmatrix**.

^{?}Download download.txt

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | > (E <- editmatrix(edits)) Edit matrix: ct p t ch cp Ops CONSTANT b1 -1 -1 1.0 0 0 == 0 b2 1 0 0.0 -1 -1 == 0 s1 0 1 -0.6 0 0 <= 0 s2 0 0 -0.3 0 1 <= 0 s3 0 0 -0.3 1 0 <= 0 p1 0 0 -1.0 0 0 < 0 p2 0 0 0.0 -1 0 < 0 p3 0 0 0.0 0 -1 < 0 p4 -1 0 0.0 0 0 < 0 Edit rules: b1 : t == ct + p [ total balance ] b2 : ct == ch + cp [ cost balance ] s1 : p <= 0.6*t [ profit sanity ] s2 : cp <= 0.3*t [ personnel cost sanity ] s3 : ch <= 0.3*t [ housing cost sanity ] p1 : 0 < t [ turnover positivity ] p2 : 0 < ch [ housing cost positivity ] p3 : 0 < cp [ personnel cost positivity ] p4 : 0 < ct [ total cost positivity ] |

Although the matrix representation and the textual representation have their merits, it is hard to see which rules are (indirectly) related via shared variables. This may be visualized by plotting the rules in a graph, where each variable and edit is a node, and a variable node is connected with an editnode if the variable occurs in the edits. Just do

^{?}Download download.txt

1 | plot(E) |

The round, blue nodes represent variables and the square nodes represent edit rules.

You can see at a glance that everything is connected, so the editmatrix does not block into submatrices. If you want to leave out the variables, and just see how the edits are connected, use

^{?}Download download.txt

1 | plot(E,nodetype='rules') |

to get

Here, (slightly) thicker lines indicate that more variables are shared. Plotting connections between variables can be done with

^{?}Download download.txt

1 | plot(E,nodetype='vars') |

Which you can try for yourself. We can do cooler stuff. For example, lets define a faulty record and detect which rules it violates.

^{?}Download download.txt

1 2 3 4 5 | > r <- c(ct = 100, ch = 30, cp = 70, p=30,t=130 ) > (v <- violatedEdits(E,r)) edit record b1 b2 s1 s2 s3 p1 p2 p3 p4 1 FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE |

So only edit **s2** is violated. We can visualize this as well.

^{?}Download download.txt

1 | plot(E,violated=v) |

gives

The complexity of error localization shown in a glance. We can try to adapt

*cp*, but this might yield violation of

**b2**or

**p3**. So here's the central question in error localization: what is the least (weighted) number of variables we have to change such that all violated rules can be obeyed without causing new violations? Editrules was actually written to answer this question. There are several functions performing this task, but here we'll use the low-level

**errorLocalizer**function and plot the result.

^{?}Download download.txt

1 2 3 4 5 | adapt <- errorLocalizer(E,r)$searchBest()$adapt plot(E, violated=v, adapt=adapt ) |

So, in order to repair the record, the turnover needs to be altered and to make sure no other rules are violated, the profit

*p*has to be altered as well.

If you don't like the colors or want to play with the igraph objects yourself, see the **as.igraph** or **adjacency** functions.

Oh, and if you wander which are the possible values to use for *p* and *t*, just substitute all the other values in the editmatrix:

^{?}Download download.txt

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | > substValue(E,names(r)[!adapt],r[!adapt]) Edit matrix: ct p t ch cp Ops CONSTANT b1 0 -1 1.0 0 0 == 100 s1 0 1 -0.6 0 0 <= 0 s2 0 0 -0.3 0 0 <= -70 s3 0 0 -0.3 0 0 <= -30 p1 0 0 -1.0 0 0 < 0 Edit rules: b1 : t == p + 100 s1 : p <= 0.6*t s2 : 70 <= 0.3*t s3 : 30 <= 0.3*t p1 : 0 < t |

The solution set to the above system of equations is the set of possible values for *t* and *p*.

**leave a comment**for the author, please follow the link and comment on his blog:

**Mark van der Loo**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...