# Function Objects and Pipelines in R

**R – Win-Vector Blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Composing functions and sequencing operations are core programming concepts.

Some notable realizations of sequencing or pipelining operations include:

- Unix’s
`|`

-pipe - CMS Pipelines.
`F#`

‘s forward pipe operator`|>`

.- Haskel’s Data.Function
`&`

operator. - The
`R`

`magrittr`

forward pipe. - Scikit-learn‘s
`sklearn.pipeline.Pipeline`

.

The idea is: many important calculations can be considered as a sequence of transforms applied to a data set. Each step may be a function taking many arguments. It is often the case that only one of each function’s arguments is primary, and the rest are parameters. For data science applications this is particularly common, so having convenient pipeline notation can be a plus. An example of a non-trivial data processing pipeline can be found here.

In this note we will discuss the advanced `R`

pipeline operator “dot arrow pipe” and an `S4`

class (`wrapr::UnaryFn`

) that makes working with pipeline notation much more powerful and much easier.

The ideas are:

- The
`wrapr`

dot arrow pipe includes a detailed`S3`

/`S4`

configurable interface (detailed in the RJournal here). These interfaces are able to treat objects as functions: i.e. they can pipe data into objects. - The
`wrapr::UnaryFn`

class supplies a convenient tool for the partial function application needed to work with pipelines.

Or: pipe notation assumes a world data transforms are single argument functions (with other parameters already bound in), and the `UnaryFn`

derived classes we discuss here help realize such a world.

This can be made clearer with examples.

Suppose we build a linear model of `log(y)`

as follows.

d <- data.frame( x = c(1, 2, 3, 4, 5), y = c(3, 5, 20, 50, 150)) model <- lm(log(y) ~ x, data = d)

We can see our predictions in original `y`

-units by making the prediction and then exponentiation:

<span class="kw">exp</span>(<span class="kw">predict</span>(model, <span class="dt">newdata =</span> d))

## 1 2 3 4 5 ## 2.459509 6.770839 18.639596 51.313366 141.261725

It is natural to want to apply a model later to new data. This can be done as follows.

d2 <- data.frame(x = 3:7) exp(predict(model, newdata = d2))

## 1 2 3 4 5 ## 18.63960 51.31337 141.26173 388.88260 1070.56368

The `wrapr`

package allows us to use a "piping into a function" notation as follows.

library("wrapr") model_f <- function(df) { exp(predict(model, newdata = df)) } d2 %.>% model_f

## 1 2 3 4 5 ## 18.63960 51.31337 141.26173 388.88260 1070.56368

In the above example the `model`

contents are captured in the function closure. However, it is better practice to explicitly store data in objects.

`wrapr`

supplies a method to do this, which we will now demonstrate.

model_o <- fnlist( pkgfn( "stats::predict.lm", arg_name = "newdata", args = list(object = model)), pkgfn( "exp", arg_name = "x")) cat(format(model_o))

## UnaryFnList( ## stats::predict.lm(newdata=., object), ## base::exp(x=., ))

Notice `model_o`

is an object (not a function). However we can pipe into `model_o`

as if it were a function.

d2 %.>%<span class="st"> </span>model_o

## 1 2 3 4 5 ## 18.63960 51.31337 141.26173 388.88260 1070.56368

This works because `model_o`

is derived from the `S4`

class `UnaryFn`

and `wrapr`

has definitions for `apply_right.UnaryFn`

and `apply_left.UnaryFn`

, which integrate this class into the `wrapr`

dot-arrow pipe. The family of `UnaryFn`

classes single argument functions. This system happens to be implemented by `wrapr`

, but `wrapr`

dot arrow extension mechanisms also allow users to build their own pipe-compatible systems. (`S3`

/`S4`

extension details can be found in the RJournal here.

The pipe notation is not strictly required as the apply is done through the `S4`

method `wrapr::ApplyTo()`

.

<span class="kw">ApplyTo</span>(model_o, d2)

## 1 2 3 4 5 ## 18.63960 51.31337 141.26173 388.88260 1070.56368

The above methods can be used to wrap substantial functions such as `vtreat::prepare()`

to create very powerful data processing pipelines. A more involved example of trying this technique can be found here.

Note: the `wrapr`

right-dispatch we are using is only triggered when the right-hand side of a pipeline is a symbol or name. This is consistent with pipelines such as "`5 %.>% sin`

" where we are not so much piping into the sin-function, but into a name that refers to the sin-function. However, piping into names covers most practical cases.

We can apply processing pipelines piece by piece.

pred_step <- pkgfn( "stats::predict.lm", arg_name = "newdata", args = list(object = model)) exp_step <- pkgfn( "base::exp", arg_name = "x") d2 %.>% pred_step %.>% exp_step

## 1 2 3 4 5 ## 18.63960 51.31337 141.26173 388.88260 1070.56368

We can also build such a pipeline by piping pieces into each other.

model_p <- pred_step %.>% exp_step cat(format(model_p))

## UnaryFnList( ## stats::predict.lm(newdata=., object), ## base::exp(x=., ))

d2 %.>%<span class="st"> </span>model_p

## 1 2 3 4 5 ## 18.63960 51.31337 141.26173 388.88260 1070.56368

The pipe notation is not required (but is a nice notation). The apply a list of function objects effect can be achieved directly with `wrapr::ApplyTo()`

.

<span class="kw">ApplyTo</span>(model_p, d2)

## 1 2 3 4 5 ## 18.63960 51.31337 141.26173 388.88260 1070.56368

The idea is: the processing pipelines store an arbitrary number of function objects as a simple list. The list declares function-like behavior to both `ApplyTo`

and the `wrapr`

dot-arrow pipe through `R`

`S3`

/`S4`

class declarations. Function objects do not capture environments as function closures do (though obviously any function in them does have its own closure). List of function objects can be easier to work with, store, and share than function closures or other pipeline structures.

We can look at the contents of a pipeline as follows.

## [[1]] ## [1] "stats::predict.lm(newdata=., object)" ## ## [[2]] ## [1] "base::exp(x=., )"

In addition to the `PartialNamedFn`

class we suggest looking at the following additional adapters:

`srcfn()`

which accepts the source code for an arbitrary expression (quoted either with quote-marks or with`wrapr::qe()`

).`wrapfn()`

class which directly accepts a function (including the closure).

Examples include:

s4 <- srcfn( qe(. + y), arg_name = ".", args= list(y=13)) print(s4)

## [1] "SrcFunction{ . + y }(.=., y)"

<span class="dv">22</span> %.>%<span class="st"> </span>s4

## [1] 35

s5 <- wrapfn( tan, arg_name = "x") print(s5)

## [1] "PartialFunction{tan}(x=., )"

<span class="dv">1</span>:<span class="dv">4</span> %.>%<span class="st"> </span>s5

## [1] 1.5574077 -2.1850399 -0.1425465 1.1578213

For convenience `wrapr`

dot-pipe pipeable object can be converted into a single-argument function of "dot" with the `as_fn()`

method:

f5 <- as_fn(s5) f5(1:5)

## [1] 1.5574077 -2.1850399 -0.1425465 1.1578213 -3.3805150

<span class="dv">1</span>:<span class="dv">5</span> %.>%<span class="st"> </span>f5

## [1] 1.5574077 -2.1850399 -0.1425465 1.1578213 -3.3805150

The idea is `wrapr`

supplies many possible variations of notations: functions sequences as lists, function composition by pipe, function composition by call, function application by pipe, and function application by call. Then the user can pick what notation they prefer. `rquery`

pipelines are very restricted (they date `data.frame`

s to `data.frame`

s and pre-check a number of invariants). `UnaryFn`

pipelines are more free-form, they check very little before application.

The demonstrated design and functionality is inspired by partially applied functions, but a bit more circumspect in what is carried around. In Lisp "code is data", in `R`

it is a bit more complicated- so a pure-data solution has some merits.

And these are the basics of `wrapr`

function objects. For a more substantial data processing example please see here.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Win-Vector Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.