Building views with R

March 7, 2017
By

(This article was first published on R blog | Quantide - R training & consulting, and kindly contributed to R-bloggers)

views-R

 

[Here you can see the Building views with R cheat sheet at a full resolution]

Queries

In database theory a query is a request for data or information from a database table or combination of tables.

Since dplyr we have something that quite closely conceptually resembles a query in R:

require(dplyr)

## Warning: package 'dplyr' was built under R version 3.2.5

require(pryr)

mtcars %>% 
  tbl_df() %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##           
## 1     4 26.66364 4.509828
## 2     6 19.74286 1.453567
## 3     8 15.10000 2.560048

I particularly appreciate of dplyr the possibility of building my query as a step by step set of R statement that I can progressively test at each step.

 

Views

Again in database theory, a view is the result set of a stored query on the data, which the database users can query just as they would in a table.

I would like to have something similar to a view in R

As far as I know, I can achieve this goal in three ways:

  • Function makeActiveBinding
  • Operator %>a% from package pryr
  • My proposed `%>>% operator

 

Function makeActiveBinding()

Function makeActiveBinding(sym, fun, env) installs a function in an environment env so that getting the value of sym calls fun with no arguments.

As a basic example I can actively bind a function that simulates a dice to an object named dice :

makeActiveBinding("dice", function() sample(1:6, 1), env = globalenv())

so that:

replicate(5 , dice)

## [1] 5 1 6 2 3

Similarly, I can wrap adplyr expression into a function:

f <- function() {mtcars %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}

and then actively bind it to a symbol:

makeActiveBinding('view', f , env = globalenv())

so that, any time we call view the result of function f()is computed again:

view

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##           
## 1     4 26.66364 4.509828
## 2     6 19.74286 1.453567
## 3     8 15.10000 2.560048

As a result, if I change any value of mpg within mtcars, view is automatically updated:

mtcars$mpg[c(1,3,5)] <- 0
view

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##           
## 1     4 24.59091 9.231192
## 2     6 16.74286 7.504189
## 3     8 13.76429 4.601606

Clearly, I have to admit that all of this looks quite unfriendly, at least to me.

 

Operator %

A valid alternative, that wraps away the complexity of function makeActiveBinding() is provided by operator % from package pryr:

view %% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg))}

Again, if I change any value of mpg within mtcars, the value of view get automatically updated:

mtcars$mpg[c(1,3,5)] <- 50
view

## # A tibble: 3 × 3
##     cyl mean_mpg    sd_mpg
##            
## 1     4 29.13636  8.159568
## 2     6 23.88571 11.593451
## 3     8 17.33571  9.688503

Note that in this case I have to enclose the whole expression within curly brackets.

Moreover, the final assignment: % goes on the left hand side of my chain of dplyr statements.

 

Operator %>>%

Finally I would like to propose a third alternative, still based on makeActiveBinding(), that I named %>>%

`%>>%` <- function( expr, x) {
  x <- substitute(x)
  call <-   match.call()[-1]
  fun <- function() {NULL}
  body(fun) <- call$expr
  makeActiveBinding(sym = deparse(x), fun = fun, env = parent.frame())
  invisible(NULL)
}

that can be used as:

mtcars %>% 
  group_by(cyl) %>% 
  summarise(mean_mpg = mean(mpg), sd_mpg = sd(mpg)) %>>% 
  view

And again, if I change the values of mpg:

mtcars$mpg[c(1,3,5)] <- 100

The content of view changes accordingly

view

## # A tibble: 3 × 3
##     cyl mean_mpg   sd_mpg
##           
## 1     4 33.68182 22.41624
## 2     6 31.02857 30.44321
## 3     8 20.90714 22.88454

I believe this operator offers two advantages:

  • Avoids the usage of curly brackets around my dplyr expression
  • Allows me to actively assign the result of my chain of dplyr statements, in a more natural way at the end of the chain

The post Building views with R appeared first on Quantide – R training & consulting.

To leave a comment for the author, please follow the link and comment on their blog: R blog | Quantide - R training & consulting.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)