Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Pipes have been a fundamental aspect of computer programming for many decades. In short, the semantics of pipes can be thought of as taking the output from the left-hand side and passing it as input to the right-hand side. For example, in a linux shell, you might cat example.txt | unique | sort to take the contents of a text file, then take one copy of each row, then sort those remaining rows. | is a common, but not universal, pipe operator and on U.S. Qwerty keyboards, is found above the backslash key: \.

Languages that don’t begin by supporting pipes often eventually implement some version of them. In R, the magrittr package introduced the %>% infix operator as a pipe operator and is most often pronounced as “then”. For example, “take the mtcars data.frame, THEN take the head of it, THEN…” and so on.

For a function to be pipe friendly, it should at least take a data object (often named .data) as its first argument and return an object of the same type—possibly even the same, unaltered object. This contract ensures that your pipe-friendly function can exist in the middle of a piped workflow, accepting the input from its left-hand side and passing along output to its right-hand side.

library(magrittr)

custom_function <-
function(.data) {
message(str(.data))

.data
}

mtcars %>%
custom_function() %>%
custom_function()

This will first display the structure of the 32 by 10 mtcars data.frame, then take the head(10) of mtcars and display the structure of that 10 by 10 reduced version, ultimately returning the reduced version which is, by default in R, printed to the console.

The dplyr package in R introduces the notion of a grouped data.frame. For example, in the mtcars data, there is a cyl parameter that classifies each observation as a 4, 6, or 8 cylinder vehicle. You might want to process each of these groups of rows separately—i.e., process all the 4 cylinder vehicles together, then all the 6 cylinder, then all the 8 cylinder:

library(dplyr)

mtcars %>%
group_by(cyl) %>%
tally()

Note that dplyr re-exports the magrittr pipe operator, so it’s not necessary to attach both dplyr and magrittr explicitly; attaching dplyr will usually suffice.

In order to make my custom function group-aware, I need to check the incoming .data object to see whether it’s a grouped data.frame. If it is, then I can use dplyr‘s do() function to call my custom function on each subset of the data. Here, the (.) notation denotes the subset of .data being handed to custom_function at each invocation.

library(dplyr)

custom_function <-
function(.data) {
if (dplyr::is_grouped_df(.data)) {
return(dplyr::do(.data, custom_function(.)))
}

message(str(.data))

.data
}

mtcars %>%
custom_function()

mtcars %>%
group_by(cyl) %>%
custom_function()

In these examples, I’ve messaged some metadata to the console, but your custom functions can do any work they like: create, plot, and save ggplots; compute statistics; generate log files; and so on.

I usually include the R three-dots parameter, ...,  to allow additional parameters to be passed into the function.

custom_function <-
function(.data, ...) {
if (dplyr::is_grouped_df(.data)) {
return(dplyr::do(.data, custom_function(., ...)))
}

message(str(.data))

.data
}