Relational operators for intervals with the intrval R package

December 2, 2016
By

[This article was first published on Peter Solymos - R related posts, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I recently posted a piece about how to write and document special functions in R. I meant that as a prelude for the topic I am writing about in this post. Let me start at the beginning. The other day Dirk Eddelbuettel tweeted about the new release of the data.table package (v1.9.8).
There were new features announced for joins based on `%inrange%` and `%between%`. That got me thinking: it would be really cool to generalize this idea for different intervals, for example as `x %[]% c(a, b)`.

Motivation

We want to evaluate if values of `x` satisfy the condition `x >= a & x <= b` given that `a <= b`. Typing `x %[]% c(a, b)` instead of the previous expression is not much shorter (14 vs. 15 characters with counting spaces). But considering the `a <= b` condition as well, it becomes a saving (`x >= min(a, b) & x <= mmax(a, b)` is 31 characters long). And sorting is really important, because by flipping `a` and `b`, we get quite different answers:

``````x <- 5
x >= 1 & x <= 10
# [1] TRUE
x >= 10 & x <= 1
# [1] FALSE
``````

Also, `min` and `max` will not be very useful when we want to vectorize the expression. We need to use `pmin` and `pmax` for obvious reasons:

``````x >= min(1:10, 10:1) & x <= max(10:1, 1:10)
# [1] TRUE
x >= pmin(1:10, 10:1) & x <= pmax(10:1, 1:10)
# [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
``````

If interval endpoints can also be open or closed, and allowing them to flip around makes the semantics of left/right closed/open interval definitions hard. We can thus all agree that there is a need for an expression, like `x %[]% c(a, b)`, that is compact, flexible, and invariant to endpoint sorting. This is exactly what the intrval package is for!

What’s in the package

Functions for evaluating if values of vectors are within
different open/closed intervals
(`x %[]% c(a, b)`), or if two closed
intervals overlap (`c(a1, b1) %[o]% c(a2, b2)`).
Operators for negation and directional relations also implemented.

Value-to-interval relations

Values of `x` are compared to interval endpoints `a` and `b` (`a <= b`).
Endpoints can be defined as a vector with two values (`c(a, b)`): these values will be compared as a single interval with each value in `x`.
If endpoints are stored in a matrix-like object or a list,
comparisons are made element-wise.

``````x <- rep(4, 5)
a <- 1:5
b <- 3:7
cbind(x=x, a=a, b=b)
x %[]% cbind(a, b) # matrix
x %[]% data.frame(a=a, b=b) # data.frame
x %[]% list(a, b) # list
``````

If lengths do not match, shorter objects are recycled. Return values are logicals.
Note: interval endpoints are sorted internally thus ensuring the condition
`a <= b` is not necessary.

These value-to-interval operators work for numeric (integer, real) and ordered vectors, and object types which are measured at least on ordinal scale (e.g. dates).

Closed and open intervals

The following special operators are used to indicate closed (`[`, `]`) or open (`(`, `)`) interval endpoints:

Operator Expression Condition
`%[]%` `x %[]% c(a, b)` `x >= a & x <= b`
`%[)%` `x %[)% c(a, b)` `x >= a & x < b`
`%(]%` `x %(]% c(a, b)` `x > a & x <= b`
`%()%` `x %()% c(a, b)` `x > a & x < b`

Negation and directional relations

Eqal Not equal Less than Greater than
`%[]%` `%)(%` `%[<]%` `%[>]%`
`%[)%` `%)[%` `%[<)%` `%[>)%`
`%(]%` `%](%` `%(<]%` `%(>]%`
`%()%` `%][%` `%(<)%` `%(>)%`

The helper function `intrval_types` can be used to
print/plot the following summary:

Interval-to-interval relations

The overlap of two closed intervals, [`a1`, `b1`] and [`a2`, `b2`],
is evaluated by the `%[o]%` operator (`a1 <= b1`, `a2 <= b2`).
Endpoints can be defined as a vector with two values
(`c(a1, b1)`)or can be stored in matrix-like objects or a lists
in which case comparisons are made element-wise.
Note: interval endpoints
are sorted internally thus ensuring the conditions
`a1 <= b1` and `a2 <= b2` is not necessary.

``````c(2:3) %[o]% c(0:1)
list(0:4, 1:5) %[o]% c(2:3)
cbind(0:4, 1:5) %[o]% c(2:3)
data.frame(a=0:4, b=1:5) %[o]% c(2:3)
``````

If lengths do not match, shorter objects are recycled.
These value-to-interval operators work for numeric (integer, real)
and ordered vectors, and object types which are measured at
least on ordinal scale (e.g. dates).

`%)o(%` is used for the negation,
directional evaluation is done via the operators `%[ and %[o>]%.`

``` Eqal Not equal Less than Greater than %[0]% %)0(% %[<0]% %[0>]% Operators for discrete variables The previous operators will return NA for unordered factors. Set overlap can be evaluated by the base %in% operator and its negation %nin%. (This feature is really redundant, I know, but decided to include regardless…) Install Install development version from GitHub (not yet on CRAN): library(devtools) install_github("psolymos/intrval") The package is licensed under GPL-2. Examples library(intrval) ## bounding box set.seed(1) n <- 10^4 x <- runif(n, -2, 2) y <- runif(n, -2, 2) d <- sqrt(x^2 + y^2) iv1 <- x %[]% c(-0.25, 0.25) & y %[]% c(-1.5, 1.5) iv2 <- x %[]% c(-1.5, 1.5) & y %[]% c(-0.25, 0.25) iv3 <- d %()% c(1, 1.5) plot(x, y, pch = 19, cex = 0.25, col = iv1 + iv2 + 1, main = "Intersecting bounding boxes") plot(x, y, pch = 19, cex = 0.25, col = iv3 + 1, main = "Deck the halls:\ndistance range from center") ## time series filtering x <- seq(0, 4*24*60*60, 60*60) dt <- as.POSIXct(x, origin="2000-01-01 00:00:00") f <- as.POSIXlt(dt)\$hour %[]% c(0, 11) plot(sin(x) ~ dt, type="l", col="grey", main = "Filtering date/time objects") points(sin(x) ~ dt, pch = 19, col = f + 1) ## QCC library(qcc) data(pistonrings) mu <- mean(pistonrings\$diameter[pistonrings\$trial]) SD <- sd(pistonrings\$diameter[pistonrings\$trial]) x <- pistonrings\$diameter[!pistonrings\$trial] iv <- mu + 3 * c(-SD, SD) plot(x, pch = 19, col = x %)(% iv +1, type = "b", ylim = mu + 5 * c(-SD, SD), main = "Shewhart quality control chart\ndiameter of piston rings") abline(h = mu) abline(h = iv, lty = 2) ## Annette Dobson (1990) "An Introduction to Generalized Linear Models". ## Page 9: Plant Weight Data. ctl <- c(4.17,5.58,5.18,6.11,4.50,4.61,5.17,4.53,5.33,5.14) trt <- c(4.81,4.17,4.41,3.59,5.87,3.83,6.03,4.89,4.32,4.69) group <- gl(2, 10, 20, labels = c("Ctl","Trt")) weight <- c(ctl, trt) lm.D9 <- lm(weight ~ group) ## compare 95% confidence intervals with 0 (CI.D9 <- confint(lm.D9)) # 2.5 % 97.5 % # (Intercept) 4.56934 5.4946602 # groupTrt -1.02530 0.2833003 0 %[]% CI.D9 # (Intercept) groupTrt # FALSE TRUE lm.D90 <- lm(weight ~ group - 1) # omitting intercept ## compare 95% confidence of the 2 groups to each other (CI.D90 <- confint(lm.D90)) # 2.5 % 97.5 % # groupCtl 4.56934 5.49466 # groupTrt 4.19834 5.12366 CI.D90[1,] %[o]% CI.D90[2,] # 2.5 % # TRUE DATE <- as.Date(c("2000-01-01","2000-02-01", "2000-03-31")) DATE %[<]% as.Date(c("2000-01-151", "2000-03-15")) # [1] TRUE FALSE FALSE DATE %[]% as.Date(c("2000-01-151", "2000-03-15")) # [1] FALSE TRUE FALSE DATE %[>]% as.Date(c("2000-01-151", "2000-03-15")) # [1] FALSE FALSE TRUE For more examples, see the unit-testing script. Feedback Please check out the package and use the issue tracker to suggest a new feature or report a problem. var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script')); Related ShareTweet To leave a comment for the author, please follow the link and comment on their blog: Peter Solymos - R related posts. R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job. Want to share your content on R-bloggers? click here if you have a blog, or here if you don't. If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook... ```
``` ```
``` Comments are closed. ```
``` Search R-bloggers Most visited articles of the week Top 10 Most Valuable Data Science Skills in 2020 5 Ways to Subset a Data Frame in R How to write the first for loop in R RStudio Projects and Working Directories: A Beginner’s Guide R – Sorting a data frame by the contents of a column Explanatory Model Analysis with modelStudio In-depth introduction to machine learning in 15 hours of expert videos Installing R packages How to create a timeline of your CV in R Sponsors // https://support.cloudflare.com/hc/en-us/articles/200169436-How-can-I-have-Rocket-Loader-ignore-my-script-s-in-Automatic-Mode- // this must be placed higher. Otherwise it doesn't work. // data-cfasync="false" is for making sure cloudflares' rocketcache doesn't interfeare with this // in this case it only works because it was used at the original script in the text widget function createCookie(name,value,days) { var expires = ""; if (days) { var date = new Date(); date.setTime(date.getTime() + (days*24*60*60*1000)); expires = "; expires=" + date.toUTCString(); } document.cookie = name + "=" + value + expires + "; path=/"; } function readCookie(name) { var nameEQ = name + "="; var ca = document.cookie.split(';'); for(var i=0;i < ca.length;i++) { var c = ca[i]; while (c.charAt(0)==' ') c = c.substring(1,c.length); if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length); } return null; } function eraseCookie(name) { createCookie(name,"",-1); } async function readTextFile(file) { // Helps people browse between pages without the need to keep downloading the same // ads txt page everytime. This way, it allows them to use their browser's cache. var random_number = readCookie("ad_random_number_cookie"); if(random_number == null) { var random_number = Math.floor(Math.random()*100*(new Date().getTime()/10000000000)); createCookie("ad_random_number_cookie",random_number,1) } file += '?t='+random_number; var rawFile = new XMLHttpRequest(); rawFile.onreadystatechange = function () { if(rawFile.readyState === 4) { if(rawFile.status === 200 || rawFile.status == 0) { // var allText = rawFile.responseText; // document.write(allText); document.write(rawFile.responseText); } } } rawFile.open("GET", file, false); rawFile.send(null); } // readTextFile('https://raw.githubusercontent.com/Raynos/file-store/master/temp.txt'); readTextFile("https://www.r-bloggers.com/wp-content/uploads/text-widget_anti-cache.txt"); Jobs for R usersResearch Software Engineer @ Princeton, New Jersey, United StatesSenior Research Specialist IIFisheries Analyst/Senior Fisheries AnalystSenior Scientist, Translational Informatics @ Vancouver, BC, CanadaSenior Principal Data Scientist @ Mountain View, California, United StatesTechnical Research Analyst – New York, U.S.Movement Building Analyst Full list of contributing R-bloggers ```
``` R-bloggers was founded by Tal Galili, with gratitude to the R community. Is powered by WordPress using a bavotasan.com design. Copyright © 2020 R-bloggers. All Rights Reserved. Terms and Conditions for this website var snp_f = []; var snp_hostname = new RegExp(location.host); var snp_http = new RegExp("^(http|https)://", "i"); var snp_cookie_prefix = ''; var snp_separate_cookies = false; var snp_ajax_url = 'https://www.r-bloggers.com/wp-admin/admin-ajax.php'; var snp_ajax_nonce = '9f25aae1ee'; var snp_ignore_cookies = false; var snp_enable_analytics_events = false; var snp_enable_mobile = false; var snp_use_in_all = false; var snp_excluded_urls = []; snp_excluded_urls.push(''); Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.) Click here to close (This popup will not appear again) .snp-pop-109583 .snp-theme6 { max-width: 700px;} .snp-pop-109583 .snp-theme6 h1 {font-size: 17px;} .snp-pop-109583 .snp-theme6 { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field ::-webkit-input-placeholder { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field :-moz-placeholder { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field :-ms-input-placeholder { color: #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field input { border: 1px solid #a0a4a9;} .snp-pop-109583 .snp-theme6 .snp-field { color: #000000;} .snp-pop-109583 .snp-theme6 { background: #f2f2f2;} jQuery(document).ready(function() { }); var CaptchaCallback = function() { jQuery('.g-recaptcha').each(function(index, el) { grecaptcha.render(el, { 'sitekey' : '' }); }); }; (function(){ var corecss = document.createElement('link'); var themecss = document.createElement('link'); var corecssurl = "https://www.r-bloggers.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shCore.css?ver=3.0.9b"; if ( corecss.setAttribute ) { corecss.setAttribute( "rel", "stylesheet" ); corecss.setAttribute( "type", "text/css" ); corecss.setAttribute( "href", corecssurl ); } else { corecss.rel = "stylesheet"; corecss.href = corecssurl; } document.head.appendChild( corecss ); var themecssurl = "https://www.r-bloggers.com/wp-content/plugins/syntaxhighlighter/syntaxhighlighter3/styles/shThemeDefault.css?ver=3.0.9b"; if ( themecss.setAttribute ) { themecss.setAttribute( "rel", "stylesheet" ); themecss.setAttribute( "type", "text/css" ); themecss.setAttribute( "href", themecssurl ); } else { themecss.rel = "stylesheet"; themecss.href = themecssurl; } document.head.appendChild( themecss ); })(); SyntaxHighlighter.config.strings.expandSource = '+ expand source'; SyntaxHighlighter.config.strings.help = '?'; SyntaxHighlighter.config.strings.alert = 'SyntaxHighlighter\n\n'; SyntaxHighlighter.config.strings.noBrush = 'Can\'t find brush for: '; SyntaxHighlighter.config.strings.brushNotHtmlScript = 'Brush wasn\'t configured for html-script option: '; SyntaxHighlighter.defaults['pad-line-numbers'] = false; SyntaxHighlighter.defaults['toolbar'] = false; SyntaxHighlighter.all(); // Infinite scroll support if ( typeof( jQuery ) !== 'undefined' ) { jQuery( function( \$ ) { \$( document.body ).on( 'post-load', function() { SyntaxHighlighter.highlight(); } ); } ); } _stq = window._stq || []; _stq.push([ 'view', {v:'ext',j:'1:7.3.2',blog:'11524731',post:'144164',tz:'-6',srv:'www.r-bloggers.com'} ]); _stq.push([ 'clickTrackerInit', '11524731', '144164' ]); jQuery(document).ready(function (\$) { //\$( document ).ajaxStart(function() { //}); for (var i = 0; i < document.forms.length; ++i) { var form = document.forms[i]; if (\$(form).attr("method") != "get") { \$(form).append('<input type="hidden" name="nAmdgxejI" value="*g4z8ODZ2" />'); } if (\$(form).attr("method") != "get") { \$(form).append('<input type="hidden" name="QpzFJ-m" value="o]8pN0G" />'); } } \$(document).on('submit', 'form', function () { if (\$(this).attr("method") != "get") { \$(this).append('<input type="hidden" name="nAmdgxejI" value="*g4z8ODZ2" />'); } if (\$(this).attr("method") != "get") { \$(this).append('<input type="hidden" name="QpzFJ-m" value="o]8pN0G" />'); } return true; }); jQuery.ajaxSetup({ beforeSend: function (e, data) { //console.log(Object.getOwnPropertyNames(data).sort()); //console.log(data.type); if (data.type !== 'POST') return; if (typeof data.data === 'object' && data.data !== null) { data.data.append("nAmdgxejI", "*g4z8ODZ2"); data.data.append("QpzFJ-m", "o]8pN0G"); } else { data.data = data.data + '&nAmdgxejI=*g4z8ODZ2&QpzFJ-m=o]8pN0G'; } } }); }); /* <![CDATA[ */ jQuery(function(){ jQuery("ul.sf-menu").supersubs({ minWidth: 12, maxWidth: 27, extraWidth: 1 }).superfish({ delay: 100, speed: 250 }); }); /* ]]> */ ```