Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Beginners are sometimes confused by the fact that
- some R users use the native Base R pipe
|>and - others use the
{magrittr}pipe%>%.
So in today’s video, I want to compare the two and show you the strengths and weaknesses of each one. Let’s dive in.
< section id="keyboard-shortcut" class="level2">Keyboard shortcut
Whatever pipe you use, you should definitely use the RStudio shortcut ctrl + shift + M. This is much quicker than writing it out. By default, this will throw the {magrittr} pipe. But you can change that in the settings.
Simple function chaining
The big advantage of the base-R pipe is that it can easily chain together a couple of functions whether any packages are loaded or not.
runif(100) |> round() |> mean() ## [1] 0.48
The same doesn’t work with the {magrittr} pipe because I have to load the package first.
runif(100) %>% round() %>% mean() ## Error in runif(100) %>% round() %>% mean(): could not find function "%>%"
But if I do load something like the Tidyverse that contains {magrittr} it works fine.
library(tidyverse) ## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ── ## ✔ dplyr 1.1.4 ✔ readr 2.1.5 ## ✔ forcats 1.0.0 ✔ stringr 1.5.1 ## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1 ## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1 ## ✔ purrr 1.0.2 ## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ── ## ✖ dplyr::filter() masks stats::filter() ## ✖ dplyr::lag() masks stats::lag() ## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors runif(100) %>% round() %>% mean() ## [1] 0.53
Form strictness
The nice thing about the {magrittr} pipe is that it isn’t as strict as the base-R pipe. For example, {magrittr} allows you to forget function calls and just use the function name.
runif(100) %>% round # works runif(100) |> round # Error function call with () is enforced ## Error: The pipe operator requires a function call as RHS (<text>:2:15)
Standard scenario
I don’t think the strictness is much of a disadvantage, though. In most cases (at least in my 90% of pipe use cases), you’ll likely use the pipe with something like mutate() where you specify additional arguments anyway. In that scenario, both pipes work pretty much the same.
dat_with_super_long_name <- tibble(x = 1:3, y = 10:12) dat_with_super_long_name |> mutate(z = x + y) ## # A tibble: 3 × 3 ## x y z ## <int> <int> <int> ## 1 1 10 11 ## 2 2 11 13 ## 3 3 12 15 dat_with_super_long_name %>% mutate(z = x + y) ## # A tibble: 3 × 3 ## x y z ## <int> <int> <int> ## 1 1 10 11 ## 2 2 11 13 ## 3 3 12 15
Using a placeholder
Fans of the original {magrittr} pipe will tell you that it’s really cool to use the . operator as a placeholder. Rightfully so, this is a neat feature.
dat_with_super_long_name %>% lm(y ~ x, data = .) ## ## Call: ## lm(formula = y ~ x, data = .) ## ## Coefficients: ## (Intercept) x ## 9 1
Initially, the base-R pipe could not pull of such a stunt. However, since R 4.3.0. it has a placeholder too.
dat_with_super_long_name |> lm(y ~ x, data = _) ## ## Call: ## lm(formula = y ~ x, data = dat_with_super_long_name) ## ## Coefficients: ## (Intercept) x ## 9 1
Using multiple placeholders
At this point, fans of the . operator will shout “The dot operator is even cooler. It can be used multiple times!” And they are absolutely right about that. That’s pretty dope.
And for the unenlightened: By wrapping a subsequent function call into {}, you can use the . operator as many times as you’d like over there. In each instance, . will then represent the data that went into {}.
Sadly, the base pipe cannot do such a thing. Its strictness forbids {}.
## Error: { not allowed
dat_with_super_long_name |> {plot(_$x, _$y, cex = 3, lwd = 5)}
## Error: function '{' not supported in RHS call of a pipe (<text>:2:29)
A workaround for that would be to
- define an anonymous function with
\(.), - wrap that into parentheses, and then
- call that function.
Shoutout to Isabella Velásquez’s blog post that taught me about this little trick.
< section id="conditional-flows" class="level2">Conditional flows
Now, sometimes people like to use if-statements in their pipe-chains. By combining the {magrittr} pipe with curly brackets and the . operator, this could look like this.
duplicate_flag <- TRUE
duplicates <- tibble(x = 1:3, z = 21:23)
dat_with_super_long_name %>%
{
if (duplicate_flag) {
. |> left_join(duplicates, by = 'x')
} else {
.
}
} %>%
summarize(across(everything(), mean))
## # A tibble: 1 × 3
## x y z
## <dbl> <dbl> <dbl>
## 1 2 11 22
duplicate_flag <- FALSE
duplicates <- tibble(x = 1:3, z = 21:23)
dat_with_super_long_name %>%
{
if (duplicate_flag) {
. |> left_join(duplicates, by = 'x')
} else {
.
}
} %>%
summarize(across(everything(), mean))
## # A tibble: 1 × 2
## x y
## <dbl> <dbl>
## 1 2 11
In the past, I have written code like this too. Nowadays, though, I try to break out such things into their own functions. Preferably, one with a descriptive function name.
That way,
- the base-R pipe can handle this much better,
- my original chain hopefully stays short, and
- when I outsource the helper functions to a separate script, the function name hopefully still tells me what it does.
left_join_if_duplicate <- function(dat, duplicate_flag) {
if (duplicate_flag) {
dat |> left_join(duplicates, by = 'x')
} else {
dat
}
}
duplicate_flag <- TRUE
dat_with_super_long_name |>
left_join_if_duplicate(duplicate_flag) |>
summarize(across(everything(), mean))
## # A tibble: 1 × 3
## x y z
## <dbl> <dbl> <dbl>
## 1 2 11 22
left_join_if_duplicate <- function(dat, duplicate_flag) {
if (duplicate_flag) {
dat |> left_join(duplicates, by = 'x')
} else {
dat
}
}
duplicate_flag <- FALSE
dat_with_super_long_name |>
left_join_if_duplicate(duplicate_flag) |>
summarize(across(everything(), mean))
## # A tibble: 1 × 2
## x y
## <dbl> <dbl>
## 1 2 11
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
