Announcing rquery

December 28, 2017
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

We are excited to announce the rquery R package.

rquery is Win-Vector LLC‘s currently in development big data query tool for R.

rquery supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with R, SQL, and dplyr at big data scale in production).

As an example: rquery operators allow us to write our earlier “treatment and control” example as follows.

dQ <- d %.>%
  extend_se(.,
            if_else_block(
              testexpr =
                "rand()>=0.5",
              thenexprs = qae(
                a_1 := 'treatment',
                a_2 := 'control'),
              elseexprs = qae(
                a_1 := 'control',
                a_2 := 'treatment'))) %.>%
  select_columns(., c("rowNum", "a_1", "a_2"))
 

rquery pipelines are first-class objects; so we can extend them, save them, and even print them.

cat(format(dQ))

table('d') %.>%
 extend(.,
  ifebtest_1 := rand() >= 0.5) %.>%
 extend(.,
  a_1 := ifelse(ifebtest_1,"treatment",a_1),
  a_2 := ifelse(ifebtest_1,"control",a_2)) %.>%
 extend(.,
  a_1 := ifelse(!( ifebtest_1 ),"control",a_1),
  a_2 := ifelse(!( ifebtest_1 ),"treatment",a_2)) %.>%
 select_columns(., rowNum, a_1, a_2)

rquery targets only databases, and right now primarilly SparkSQL and PostgreSQL. rquery is primarily a SQL generator, allowing it to avoid some of the trade-offs required to directly support in-memory data.frames. We demonstrate converting the above rquery pipeline into SQL and executing it here.

rquery itself is still in early development (and not yet ready for extensive use in production), but it is maturing fast, and we expect more rquery announcements going forward. Our current intent is to bring in sponsors, partners, and R community voices to help develop and steer rquery.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)