We are excited to announce the `rquery`

`R`

package.

`rquery`

is Win-Vector LLC‘s currently in development big data query tool for `R`

.

`rquery`

supplies set of operators inspired by Edgar F. Codd‘s relational algebra (updated to reflect lessons learned from working with `R`

, `SQL`

, and `dplyr`

at big data scale in production).

As an example: `rquery`

operators allow us to write our earlier “treatment and control” example as follows.

dQ <- d %.>%
extend_se(.,
if_else_block(
testexpr =
"rand()>=0.5",
thenexprs = qae(
a_1 := 'treatment',
a_2 := 'control'),
elseexprs = qae(
a_1 := 'control',
a_2 := 'treatment'))) %.>%
select_columns(., c("rowNum", "a_1", "a_2"))

`rquery`

pipelines are first-class objects; so we can extend them, save them, and even print them.

cat(format(dQ))
table('d') %.>%
extend(.,
ifebtest_1 := rand() >= 0.5) %.>%
extend(.,
a_1 := ifelse(ifebtest_1,"treatment",a_1),
a_2 := ifelse(ifebtest_1,"control",a_2)) %.>%
extend(.,
a_1 := ifelse(!( ifebtest_1 ),"control",a_1),
a_2 := ifelse(!( ifebtest_1 ),"treatment",a_2)) %.>%
select_columns(., rowNum, a_1, a_2)

`rquery`

targets only databases, and right now primarilly `SparkSQL`

and `PostgreSQL`

. `rquery`

is primarily a `SQL`

generator, allowing it to avoid some of the trade-offs required to directly support in-memory `data.frame`

s. We demonstrate converting the above `rquery`

pipeline into `SQL`

and executing it here.

`rquery`

itself is still in early development (and not yet ready for extensive use in production), but it is maturing fast, and we expect more `rquery`

announcements going forward. Our current intent is to bring in sponsors, partners, and `R`

community voices to help develop and steer `rquery`

.

*Related*

To

**leave a comment** for the author, please follow the link and comment on their blog:

** R – Win-Vector Blog**.

R-bloggers.com offers

**daily e-mail updates** about

R news and

tutorials on topics such as:

Data science,

Big Data, R jobs, visualization (

ggplot2,

Boxplots,

maps,

animation), programming (

RStudio,

Sweave,

LaTeX,

SQL,

Eclipse,

git,

hadoop,

Web Scraping) statistics (

regression,

PCA,

time series,

trading) and more...

If you got this far, why not

__subscribe for updates__ from the site? Choose your flavor:

e-mail,

twitter,

RSS, or

facebook...