Site icon R-bloggers

Comparing pipes: Base-R |> vs {magrittr} %>%

[This article was first published on Albert Rapp, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

  • Beginners are sometimes confused by the fact that

    • some R users use the native Base R pipe |> and
    • others use the {magrittr} pipe %>%.

    So in today’s video, I want to compare the two and show you the strengths and weaknesses of each one. Let’s dive in.

    < section id="keyboard-shortcut" class="level2">

    Keyboard shortcut

    Whatever pipe you use, you should definitely use the RStudio shortcut ctrl + shift + M. This is much quicker than writing it out. By default, this will throw the {magrittr} pipe. But you can change that in the settings.

    < section id="simple-function-chaining" class="level2">

    Simple function chaining

    The big advantage of the base-R pipe is that it can easily chain together a couple of functions whether any packages are loaded or not.

    runif(100) |> round() |> mean()
    ## [1] 0.48

    The same doesn’t work with the {magrittr} pipe because I have to load the package first.

    runif(100) %>% round() %>%  mean()
    ## Error in runif(100) %>% round() %>% mean(): could not find function "%>%"

    But if I do load something like the Tidyverse that contains {magrittr} it works fine.

    library(tidyverse) 
    ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
    ## ✔ dplyr     1.1.4     ✔ readr     2.1.5
    ## ✔ forcats   1.0.0     ✔ stringr   1.5.1
    ## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
    ## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
    ## ✔ purrr     1.0.2     
    ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
    ## ✖ dplyr::filter() masks stats::filter()
    ## ✖ dplyr::lag()    masks stats::lag()
    ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
    runif(100) %>% round() %>% mean()
    ## [1] 0.53
    < section id="form-strictness" class="level2">

    Form strictness

    The nice thing about the {magrittr} pipe is that it isn’t as strict as the base-R pipe. For example, {magrittr} allows you to forget function calls and just use the function name.

    runif(100) %>% round # works
    runif(100) |> round  # Error function call with () is enforced
    ## Error: The pipe operator requires a function call as RHS (<text>:2:15)
    < section id="standard-scenario" class="level2">

    Standard scenario

    I don’t think the strictness is much of a disadvantage, though. In most cases (at least in my 90% of pipe use cases), you’ll likely use the pipe with something like mutate() where you specify additional arguments anyway. In that scenario, both pipes work pretty much the same.

    dat_with_super_long_name <- tibble(x = 1:3, y = 10:12)
    dat_with_super_long_name |> 
      mutate(z = x + y)
    ## # A tibble: 3 × 3
    ##       x     y     z
    ##   <int> <int> <int>
    ## 1     1    10    11
    ## 2     2    11    13
    ## 3     3    12    15
    dat_with_super_long_name %>%
      mutate(z = x + y)
    ## # A tibble: 3 × 3
    ##       x     y     z
    ##   <int> <int> <int>
    ## 1     1    10    11
    ## 2     2    11    13
    ## 3     3    12    15
    < section id="using-a-placeholder" class="level2">

    Using a placeholder

    Fans of the original {magrittr} pipe will tell you that it’s really cool to use the . operator as a placeholder. Rightfully so, this is a neat feature.

    dat_with_super_long_name %>% lm(y ~ x, data = .)
    ## 
    ## Call:
    ## lm(formula = y ~ x, data = .)
    ## 
    ## Coefficients:
    ## (Intercept)            x  
    ##           9            1

    Initially, the base-R pipe could not pull of such a stunt. However, since R 4.3.0. it has a placeholder too.

    dat_with_super_long_name |> lm(y ~ x, data = _)
    ## 
    ## Call:
    ## lm(formula = y ~ x, data = dat_with_super_long_name)
    ## 
    ## Coefficients:
    ## (Intercept)            x  
    ##           9            1
    < section id="using-multiple-placeholders" class="level2">

    Using multiple placeholders

    At this point, fans of the . operator will shout “The dot operator is even cooler. It can be used multiple times!” And they are absolutely right about that. That’s pretty dope.

    And for the unenlightened: By wrapping a subsequent function call into {}, you can use the . operator as many times as you’d like over there. In each instance, . will then represent the data that went into {}.

    dat_with_super_long_name %>% {plot(.$x, .$y, cex = 3, lwd = 5)}

    Sadly, the base pipe cannot do such a thing. Its strictness forbids {}.

    ## Error: { not allowed
    dat_with_super_long_name |> {plot(_$x, _$y, cex = 3, lwd = 5)} 
    ## Error: function '{' not supported in RHS call of a pipe (<text>:2:29)

    A workaround for that would be to

    • define an anonymous function with \(.),
    • wrap that into parentheses, and then
    • call that function.
    dat_with_super_long_name |> 
      (\(.) plot(.$x, .$y, cex = 3, lwd = 5))()

    Shoutout to Isabella Velásquez’s blog post that taught me about this little trick.

    < section id="conditional-flows" class="level2">

    Conditional flows

    Now, sometimes people like to use if-statements in their pipe-chains. By combining the {magrittr} pipe with curly brackets and the . operator, this could look like this.

    duplicate_flag <- TRUE
    duplicates <- tibble(x = 1:3, z = 21:23)
    dat_with_super_long_name %>%
      {
        if (duplicate_flag) {
          . |> left_join(duplicates, by = 'x')
        } else {
          .
        }
      } %>%
      summarize(across(everything(), mean))
    ## # A tibble: 1 × 3
    ##       x     y     z
    ##   <dbl> <dbl> <dbl>
    ## 1     2    11    22
    duplicate_flag <- FALSE
    duplicates <- tibble(x = 1:3, z = 21:23)
    dat_with_super_long_name %>%
      {
        if (duplicate_flag) {
          . |> left_join(duplicates, by = 'x')
        } else {
          .
        }
      } %>%
      summarize(across(everything(), mean))
    ## # A tibble: 1 × 2
    ##       x     y
    ##   <dbl> <dbl>
    ## 1     2    11

    In the past, I have written code like this too. Nowadays, though, I try to break out such things into their own functions. Preferably, one with a descriptive function name.

    That way,

    • the base-R pipe can handle this much better,
    • my original chain hopefully stays short, and
    • when I outsource the helper functions to a separate script, the function name hopefully still tells me what it does.
    left_join_if_duplicate <- function(dat, duplicate_flag) {
      if (duplicate_flag) {
        dat |> left_join(duplicates, by = 'x') 
      } else {
        dat
      }
    }
    duplicate_flag <- TRUE
    dat_with_super_long_name |> 
      left_join_if_duplicate(duplicate_flag) |> 
      summarize(across(everything(), mean))
    ## # A tibble: 1 × 3
    ##       x     y     z
    ##   <dbl> <dbl> <dbl>
    ## 1     2    11    22
    left_join_if_duplicate <- function(dat, duplicate_flag) {
      if (duplicate_flag) {
        dat |> left_join(duplicates, by = 'x') 
      } else {
        dat
      }
    }
    duplicate_flag <- FALSE
    dat_with_super_long_name |> 
      left_join_if_duplicate(duplicate_flag) |> 
      summarize(across(everything(), mean))
    ## # A tibble: 1 × 2
    ##       x     y
    ##   <dbl> <dbl>
    ## 1     2    11
    To leave a comment for the author, please follow the link and comment on their blog: Albert Rapp.

    R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
    Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
  • Exit mobile version