Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
You can read the original post in its original format on Rtask website by ThinkR here: Row-wise operations with the {tidyverse}
We are often asked how to perform row-wise operations in a data.frame (or a tibble) the answer is, as usual, “it depends” ?
Let’s look at some cases that should fit your needs.
library(tidyverse)
Let’s make an example dataset:
base <- tibble::tibble( a = 1:10, b = 1:10, c = 21:30 ) %>% head() base ## # A tibble: 6 × 3 ## a b c ## <int> <int> <int> ## 1 1 1 21 ## 2 2 2 22 ## 3 3 3 23 ## 4 4 4 24 ## 5 5 5 25 ## 6 6 6 26
Let’s say we want to add a new column whose value will depend on the content, per row, of columns a, b and c of our base example
Like this:
# A tibble: 6 x 4
a b c new
<int> <int> <int> <chr>
1 1 1 21 a equals 1
2 2 2 22 other case
3 3 3 23 other case
4 4 4 24 other case
5 5 5 25 c equals 25
6 6 6 26 other case
With case_when()
base %>%
mutate(
new = case_when(
a == 1 ~ "a equals 1",
c == 25 ~ "c equals 25",
TRUE ~ "other case"
)
)
## # A tibble: 6 × 4
## a b c new
## <int> <int> <int> <chr>
## 1 1 1 21 a equals 1
## 2 2 2 22 other case
## 3 3 3 23 other case
## 4 4 4 24 other case
## 5 5 5 25 c equals 25
## 6 6 6 26 other case
case_when() is nice, it’s much more readable than nested ifelse(), but it can quickly become more complex.
So let’s create a function which, depending on the values of a, b, c, returns the expected value.
Depending on the case (and your skills) you will sometimes have a vectorized function and sometimes a non-vectorized function. It is always better to create a vectorized function, but it is not always possible.
A vectorized function is a function that can be directly applied to a set of vectors and that returns a response vector.
An example of a vectorized function that repeats the operations of the previous case_when():
vectorised_function <- function(a, b, c, ...){
ifelse(a == 1 , "a equals 1",
ifelse(c == 25 , "c equals 25",
"other case"
))
}
vectorised_function(a = 1, c = 25, b = "R")
## [1] "a equals 1"
vectorised_function(a = c(1, 1, 3), c = 27:25, b = "R")
## [1] "a equals 1" "a equals 1" "c equals 25"
Here is the “same” function, but not vectorized:
non_vectorised_function <- function(a, b, c, ...){
if ( a == 1 ) { return("a equals 1") }
if ( c == 25 ) { return("c equals 25") }
return("autre")
}
non_vectorised_function(a = 1, c = 25, b = "R")
## [1] "a equals 1"
non_vectorised_function(a = c(1, 1, 3), c = 27:25, b = "R") # ne fonctionne pas
## Warning in if (a == 1) {: la condition a une longueur > 1 et seul le
## premier élément est utilisé
## [1] "a equals 1"
With a vectorized function
This is the simplest case, and the fastest too.
You can use it as is in a mutate() :
base %>%
mutate(
new = vectorised_function(a = a, b = b, c = c)
)
## # A tibble: 6 × 4
## a b c new
## <int> <int> <int> <chr>
## 1 1 1 21 a equals 1
## 2 2 2 22 other case
## 3 3 3 23 other case
## 4 4 4 24 other case
## 5 5 5 25 c equals 25
## 6 6 6 26 other case
With a NON vectorized function
The result returned by a mutate() is not correct (the first value returned is repeated…)
base %>%
mutate(
new = non_vectorised_function(a = a, b = b, c = c)
)
## Warning in if (a == 1) {: la condition a une longueur > 1 et seul le
## premier élément est utilisé
## # A tibble: 6 × 4
## a b c new
## <int> <int> <int> <chr>
## 1 1 1 21 a equals 1
## 2 2 2 22 a equals 1
## 3 3 3 23 a equals 1
## 4 4 4 24 a equals 1
## 5 5 5 25 a equals 1
## 6 6 6 26 a equals 1
So let’s change our strategy.
With rowwise()
rowwise() is back in the {dplyr} world and is specifically designed for this case:
base %>%
rowwise() %>%
mutate(
new = non_vectorised_function(a = a, b = b, c = c)
)
## # A tibble: 6 × 4
## # Rowwise:
## a b c new
## <int> <int> <int> <chr>
## 1 1 1 21 a equals 1
## 2 2 2 22 autre
## 3 3 3 23 autre
## 4 4 4 24 autre
## 5 5 5 25 c equals 25
## 6 6 6 26 autre
With pmap()
base %>%
mutate(
new = pmap_chr(list(a = a, b = b, c = c), non_vectorised_function)
)
## # A tibble: 6 × 4
## a b c new
## <int> <int> <int> <chr>
## 1 1 1 21 a equals 1
## 2 2 2 22 autre
## 3 3 3 23 autre
## 4 4 4 24 autre
## 5 5 5 25 c equals 25
## 6 6 6 26 autre
Bonus with Vectorize()
The Vectorize() function allows to vectorize a function…
It’s a bit of a cheat, but it can help ?
base %>%
mutate(
new = Vectorize(non_vectorised_function)(a = a, b = b, c = c)
)
## # A tibble: 6 × 4
## a b c new
## <int> <int> <int> <chr>
## 1 1 1 21 a equals 1
## 2 2 2 22 autre
## 3 3 3 23 autre
## 4 4 4 24 autre
## 5 5 5 25 c equals 25
## 6 6 6 26 autre
Row-wise operations are yours!
Experiment and tell us what your practices are!
To go further: https://dplyr.tidyverse.org/articles/rowwise.html
This post is better presented on its original ThinkR website here: Row-wise operations with the {tidyverse}
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
