## Queries

In database theory a query is a request for data or information from a database table or combination of tables.

Since `dplyr`

we have something that quite closely conceptually resembles a query in `R`

:

require(dplyr)

## Warning: package 'dplyr' was built under R version 3.2.5

require(pryr)

mtcars %>% tbl_df() %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))

## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg #### 1 4 26.66364 4.509828 ## 2 6 19.74286 1.453567 ## 3 8 15.10000 2.560048

I particularly appreciate of `dplyr`

the possibility of building my *query* as a step by step set of `R`

statement that I can progressively test at each step.

## Views

Again in database theory, a *view* is the result set of a stored query on the data, which the database users can query just as they would in a table.

I would like to have something similar to a view in `R`

As far as I know, I can achieve this goal in three ways:

- Function
`makeActiveBinding`

- Operator
`%>a%`

from package`pryr`

- My proposed `%>>% operator

## Function `makeActiveBinding()`

Function `makeActiveBinding(sym, fun, env)`

installs a function in an environment `env`

so that getting the value of `sym`

calls `fun`

with no arguments.

As a basic example I can actively bind a function that simulates a dice to an object named `dice`

:

makeActiveBinding("dice", function() sample(1:6, 1), env = globalenv())

so that:

replicate(5 , dice)

## [1] 5 1 6 2 3

Similarly, I can wrap a`dplyr`

expression into a function:

f <- function() {mtcars %>% group_by(cyl) %>% summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}

and then actively bind it to a symbol:

makeActiveBinding('view', f , env = globalenv())

so that, any time we call `view`

the result of function `f()`

is computed again:

view

## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg #### 1 4 26.66364 4.509828 ## 2 6 19.74286 1.453567 ## 3 8 15.10000 2.560048

As a result, if I change any value of `mpg`

within `mtcars`

, `view`

is automatically updated:

mtcars$mpg[c(1,3,5)] <- 0 view

## # A tibble: 3 × 3 ## cyl mean_mpg sd_mpg #### 1 4 24.59091 9.231192 ## 2 6 16.74286 7.504189 ## 3 8 13.76429 4.601606

Clearly, I have to admit that all of this looks quite unfriendly, at least to me.

## Operator `%`

```
```A valid alternative, that wraps away the complexity of function `makeActiveBinding()`

is provided by operator `%` from package `pryr`

:

```
```view %%
group_by(cyl) %>%
summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}

Again, if I change any value of `mpg`

within `mtcars`

, the value of `view`

get automatically updated:

mtcars$mpg[c(1,3,5)] <- 50
view

## # A tibble: 3 × 3
## cyl mean_mpg sd_mpg
##
## 1 4 29.13636 8.159568
## 2 6 23.88571 11.593451
## 3 8 17.33571 9.688503

Note that in this case I have to enclose the whole expression within curly brackets.

Moreover, the final assignment: `%` goes on the left hand side of my chain of `dplyr`

statements.

```
```

```
```## Operator `%>>%`

Finally I would like to propose a third alternative, still based on `makeActiveBinding()`

, that I named `%>>%`

`%>>%` <- function( expr, x) {
x <- substitute(x)
call <- match.call()[-1]
fun <- function() {NULL}
body(fun) <- call$expr
makeActiveBinding(sym = deparse(x), fun = fun, env = parent.frame())
invisible(NULL)
}

that can be used as:

mtcars %>%
group_by(cyl) %>%
summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg)) %>>%
view

And again, if I change the values of `mpg`

:

mtcars$mpg[c(1,3,5)] <- 100

The content of `view`

changes accordingly

view

## # A tibble: 3 × 3
## cyl mean_mpg sd_mpg
##
## 1 4 33.68182 22.41624
## 2 6 31.02857 30.44321
## 3 8 20.90714 22.88454

I believe this operator offers two advantages:

- Avoids the usage of curly brackets around my
`dplyr`

expression
- Allows me to actively assign the result of my chain of
`dplyr`

statements, in a more *natural way* at the end of the chain

