# Relational operators for intervals with the intrval R package

**Peter Solymos - R related posts**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently posted a piece about how to write and document special functions in R. I meant that as a prelude for the topic I am writing about in this post. Let me start at the beginning. The other day Dirk Eddelbuettel tweeted about the new release of the **data.table** package (v1.9.8).

There were new features announced for joins based on `%inrange%`

and `%between%`

. That got me thinking: it would be really cool to generalize this idea for different intervals, for example as `x %[]% c(a, b)`

.

## Motivation

We want to evaluate if values of `x`

satisfy the condition `x >= a & x <= b`

given that `a <= b`

. Typing `x %[]% c(a, b)`

instead of the previous expression is not much shorter (14 vs. 15 characters with counting spaces). But considering the `a <= b`

condition as well, it becomes a saving (`x >= min(a, b) & x <= mmax(a, b)`

is 31 characters long). And sorting is really important, because by flipping `a`

and `b`

, we get quite different answers:

```
x <- 5
x >= 1 & x <= 10
# [1] TRUE
x >= 10 & x <= 1
# [1] FALSE
```

Also, `min`

and `max`

will not be very useful when we want to vectorize the expression. We need to use `pmin`

and `pmax`

for obvious reasons:

```
x >= min(1:10, 10:1) & x <= max(10:1, 1:10)
# [1] TRUE
x >= pmin(1:10, 10:1) & x <= pmax(10:1, 1:10)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
```

If interval endpoints can also be open or closed, and allowing them to flip around makes the semantics of left/right closed/open interval definitions hard. We can thus all agree that there is a need for an expression, like `x %[]% c(a, b)`

, that is *compact*, *flexible*, and *invariant* to endpoint sorting. This is exactly what the **intrval** package is for!

## What’s in the package

Functions for evaluating if values of vectors are within

different open/closed intervals

(`x %[]% c(a, b)`

), or if two closed

intervals overlap (`c(a1, b1) %[o]% c(a2, b2)`

).

Operators for negation and directional relations also implemented.

### Value-to-interval relations

Values of `x`

are compared to interval endpoints `a`

and `b`

(`a <= b`

).

Endpoints can be defined as a vector with two values (`c(a, b)`

): these values will be compared as a single interval with each value in `x`

.

If endpoints are stored in a matrix-like object or a list,

comparisons are made element-wise.

```
x <- rep(4, 5)
a <- 1:5
b <- 3:7
cbind(x=x, a=a, b=b)
x %[]% cbind(a, b) # matrix
x %[]% data.frame(a=a, b=b) # data.frame
x %[]% list(a, b) # list
```

If lengths do not match, shorter objects are recycled. Return values are logicals.

Note: interval endpoints are sorted internally thus ensuring the condition

`a <= b`

is not necessary.

These value-to-interval operators work for numeric (integer, real) and ordered vectors, and object types which are measured at least on ordinal scale (e.g. dates).

#### Closed and open intervals

The following special operators are used to indicate closed (`[`

, `]`

) or open (`(`

, `)`

) interval endpoints:

Operator | Expression | Condition |
---|---|---|

`%[]%` | `x %[]% c(a, b)` | `x >= a & x <= b` |

`%[)%` | `x %[)% c(a, b)` | `x >= a & x < b` |

`%(]%` | `x %(]% c(a, b)` | `x > a & x <= b` |

`%()%` | `x %()% c(a, b)` | `x > a & x < b` |

#### Negation and directional relations

Eqal | Not equal | Less than | Greater than |
---|---|---|---|

`%[]%` | `%)(%` | `%[<]%` | `%[>]%` |

`%[)%` | `%)[%` | `%[<)%` | `%[>)%` |

`%(]%` | `%](%` | `%(<]%` | `%(>]%` |

`%()%` | `%][%` | `%(<)%` | `%(>)%` |

The helper function `intrval_types`

can be used to

print/plot the following summary:

### Interval-to-interval relations

The overlap of two closed intervals, [`a1`

, `b1`

] and [`a2`

, `b2`

],

is evaluated by the `%[o]%`

operator (`a1 <= b1`

, `a2 <= b2`

).

Endpoints can be defined as a vector with two values

(`c(a1, b1)`

)or can be stored in matrix-like objects or a lists

in which case comparisons are made element-wise.

Note: interval endpoints

are sorted internally thus ensuring the conditions

`a1 <= b1`

and `a2 <= b2`

is not necessary.

```
c(2:3) %[o]% c(0:1)
list(0:4, 1:5) %[o]% c(2:3)
cbind(0:4, 1:5) %[o]% c(2:3)
data.frame(a=0:4, b=1:5) %[o]% c(2:3)
```

If lengths do not match, shorter objects are recycled.

These value-to-interval operators work for numeric (integer, real)

and ordered vectors, and object types which are measured at

least on ordinal scale (e.g. dates).

`%)o(%`

is used for the negation,

directional evaluation is done via the operators `%[`

`%[o>]%`

.

Eqal | Not equal | Less than | Greater than |
---|---|---|---|

`%[0]%` | `%)0(%` | `%[<0]%` | `%[0>]%` |

`Operators for discrete variables`

`The previous operators will return `

`NA`

for unordered factors.

Set overlap can be evaluated by the base `%in%`

operator and its negation

`%nin%`

. (This feature is really redundant, I know, but decided to include regardless…)

`Install`

`Install development version from GitHub (not yet on CRAN):`

```
library(devtools)
install_github("psolymos/intrval")
```

`The package is licensed under GPL-2.`

`Examples`

```
library(intrval)
## bounding box
set.seed(1)
n <- 10^4
x <- runif(n, -2, 2)
y <- runif(n, -2, 2)
d <- sqrt(x^2 + y^2)
iv1 <- x %[]% c(-0.25, 0.25) & y %[]% c(-1.5, 1.5)
iv2 <- x %[]% c(-1.5, 1.5) & y %[]% c(-0.25, 0.25)
iv3 <- d %()% c(1, 1.5)
plot(x, y, pch = 19, cex = 0.25, col = iv1 + iv2 + 1,
main = "Intersecting bounding boxes")
plot(x, y, pch = 19, cex = 0.25, col = iv3 + 1,
main = "Deck the halls:\ndistance range from center")
## time series filtering
x <- seq(0, 4*24*60*60, 60*60)
dt <- as.POSIXct(x, origin="2000-01-01 00:00:00")
f <- as.POSIXlt(dt)$hour %[]% c(0, 11)
plot(sin(x) ~ dt, type="l", col="grey",
main = "Filtering date/time objects")
points(sin(x) ~ dt, pch = 19, col = f + 1)
## QCC
library(qcc)
data(pistonrings)
mu <- mean(pistonrings$diameter[pistonrings$trial])
SD <- sd(pistonrings$diameter[pistonrings$trial])
x <- pistonrings$diameter[!pistonrings$trial]
iv <- mu + 3 * c(-SD, SD)
plot(x, pch = 19, col = x %)(% iv +1, type = "b", ylim = mu + 5 * c(-SD, SD),
main = "Shewhart quality control chart\ndiameter of piston rings")
abline(h = mu)
abline(h = iv, lty = 2)
## Annette Dobson (1990) "An Introduction to Generalized Linear Models".
## Page 9: Plant Weight Data.
ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14)
trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69)
group <- gl(2, 10, 20, labels = c("Ctl","Trt"))
weight <- c(ctl, trt)
lm.D9 <- lm(weight ~ group)
## compare 95% confidence intervals with 0
(CI.D9 <- confint(lm.D9))
# 2.5 % 97.5 %
# (Intercept) 4.56934 5.4946602
# groupTrt -1.02530 0.2833003
0 %[]% CI.D9
# (Intercept) groupTrt
# FALSE TRUE
lm.D90 <- lm(weight ~ group - 1) # omitting intercept
## compare 95% confidence of the 2 groups to each other
(CI.D90 <- confint(lm.D90))
# 2.5 % 97.5 %
# groupCtl 4.56934 5.49466
# groupTrt 4.19834 5.12366
CI.D90[1,] %[o]% CI.D90[2,]
# 2.5 %
# TRUE
DATE <- as.Date(c("2000-01-01","2000-02-01", "2000-03-31"))
DATE %[<]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] TRUE FALSE FALSE
DATE %[]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE TRUE FALSE
DATE %[>]% as.Date(c("2000-01-151", "2000-03-15"))
# [1] FALSE FALSE TRUE
```

`For more examples, see the unit-testing script.`

`Feedback`

`Please check out the package and use the issue tracker`

to suggest a new feature or report a problem.

` To `**leave a comment** for the author, please follow the link and comment on their blog: ** Peter Solymos - R related posts**.

R-bloggers.com offers **daily e-mail updates** about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.