magrittr: Simplifying R code with pipes

July 23, 2014
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

R is a functional language, which means that your code often contains a lot of ( parentheses ). And complex code often means nesting those parentheses together, which make code hard to read and understand. But there's a very handy R package — magrittr, by Stefan Milton Bache — which lets you transform nested function calls into a simple pipeline of operations that's easier to write and understand.

Hadley Wickham's dplyr package benefits from the %>% pipeline operator provided by magrittr. Hadley showed at useR! 2014 an example of a data transformation operation using traditional R function calls:

hourly_delay <- filter( 
  summarise(
    group_by( 
      filter(
        flights, 
        !is.na(dep_delay)
      ), 
      date, hour
    ), 
    delay = mean(dep_delay), 
    n = n()
  ), 
  n > 10 
) 

Here's the same code, but rather than nesting one function call inside the next, data is passed from one function to the next using the %>% operator:

hourly_delay <- flights %>% 
 filter(!is.na(dep_delay)) %>% 
 group_by(date, hour) %>% 
 summarise( 
   delay = mean(dep_delay), 
   n = n() ) %>% 
 filter(n > 10)

You can read this version aloud to easily get a sense of what it does: the flights data frame is filtered (to remove missing values of the dep_delay variable), grouped by hours within days, the mean delay is calculated withn groups, and returns the mean delay for those hours with more than 10 flights.

You can use the %>% operator with standard R functions — and even your own functions — too. The rules are simple: the object on the left hand side is passed as the first argument to the function on the right hand side. So: 

  • my.data %>% my.function is the same as my.function(my.data)
  • my.data %>% my.function(arg=value) is the same as my.function(my.data, arg=value)

It's even possible to pass in data to something other than the first argument of the function  using a . (dot) operator to mark the place where the object goes — see the magrittr vignette for details.

This new "pipelining" operation is a really useful addition to the R language, and R developers are starting to use it to make their code simpler to write and maintain. Hadley Wickham's newest R package, tidyr, makes it easy to clean up data sets for analysis by stringing together operations like "gather" and "spread" using the %>% operator.

And speaking of pipelining, you may have been wondering where the name "magrittr" comes from. Here's the answer

MagrittePipe

The only other question is: will Stefan be making this coffee mug available?

magrittr vignette: Ceci n'est pas un pipe 

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.