Site icon R-bloggers

Piping is Method Chaining

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

What R users now call piping, popularized by Stefan Milton Bache and Hadley Wickham, is inline function application (this is notationally similar to, but distinct from the powerful interprocess communication and concurrency tool introduced to Unix by Douglas McIlroy in 1973). In object oriented languages this sort of notation for function application has been called “method chaining” since the days of Smalltalk (~1972). Let’s take a look at method chaining in Python, in terms of pipe notation.

Let’s work an example using Python‘s Pandas package (and classes).

import pandas as pd
data = [['alpha', 'a', 1, 0], 
        ['beta', 'b', 2, 10], 
        ['gamma', 'b', 3, 10]]
df = pd.DataFrame(data, 
                  columns=['name', 'group', 'value', 'cost'])
print(df)
    name group  value  cost
0  alpha     a      1     0
1   beta     b      2    10
2  gamma     b      3    10

Method chaining is when methods return a reference to their host-object (or reference to a replacement for their host-object). This lets us call a sequence of methods one after the other as we show below.

print(df.groupby("group").agg({"value":["max", "min"], "cost":["mean"]}))
      value     cost
        max min mean
group               
a         1   1    0
b         3   2   10

This may not be considered legible (especially as it was combined with print() function notation), so we use a common notation convention and insert a line-break before each method dispatch “.“. The parenthesis surrounding the whole expression are a common Python convention to facilitate multi-line expressions.

( 
df 
   .groupby("group")
   .agg({"value":["max", "min"], "cost":["mean"]}) 
   .pipe(print)
)
      value     cost
        max min mean
group               
a         1   1    0
b         3   2   10

Or, to emphasize the similarity to pipes, we can use another convention (that contravenes the PEP8 style guide): end the lines with .\ which is the method dispatch “.” symbol plus a line continuation mark.

df .\
    groupby("group") .\
    agg({"value":["max", "min"], "cost":["mean"]}) .\
    pipe(print)
      value     cost
        max min mean
group               
a         1   1    0
b         3   2   10

The above is just as with the Bizarro Pipe in R: the pipe is available as a convention over the existing language syntax. In Python (for method chaining enabled classes and methods) the glyph “.” is in fact already a method application operator or pipe (as is the glyph “.\EOL“, where EOL denotes the line-break or end of line). With method-chaining conventions the “.” already is “a pipe” organizing method application form left to right without the need for illegible nesting. In R the glyph “->.;” is a function application operator or pipe (which we called the Bizarro Pipe; the Bizarro Pipe is a first-rate pipe, faster than other pipes, and interferes less with debugging than other pipes).

Both languages have had this application capability for a very long time. We are using Pandas and pipe() as our example, but any package that whose methods return a reference to the object being worked on (or a reference to a replacement object) can be treated as a pipe-able object. If the class further implements one function re-director method (such as pipe()) then a lot more becomes practical. Here is another example showing how additional named and unnamed arguments can be handled.

def add_delta_to_column(df, colname, delta):
    df[colname] = df[colname] + delta
    return df
df .\
    pipe(add_delta_to_column, "cost", 5) .\
    groupby("group") .\
    agg({"value":["max", "min"], "cost":["mean"]}) .\
    pipe(print, "DEBUG1", sep = " | ")
      value     cost
        max min mean
group               
a         1   1    5
b         3   2   15 | DEBUG1

So depending on your point of view: “piping is poor-persons’s method chaining” or “method chaining is poor-persons’s piping” (taken from the usual quote comparing objects and closures).

If one wants to go further, there are a number of Python packages adding additional significant piping capabilities (either through notation, operator overloading, or other methods).

And that is piping versus method chaining.

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.