Filtering with string statements in dplyr

[This article was first published on R on Alan Yeung, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A question came up recently at work about how to use a filter statement entered as a complete string variable inside dplyr’s filter() function – for example dplyr::filter(my_data, "var1 == 'a'"). There does not seem to be much out there on this and I was not sure how to do it either but luckily jakeybob had a neat solution that seems to work well.

some_data %>% 
    filter(eval(rlang::parse_expr(selection_statement)))

Let’s see it in action using the iris flowers dataset. First note how many records there are for each species (n = 50 for each) so we can check that the filtering has worked later.

library(tidyverse)

iris2 <- as_tibble(iris)
count(iris2, Species)
# # A tibble: 3 x 2
#   Species        n
#   <fct>      <int>
# 1 setosa        50
# 2 versicolor    50
# 3 virginica     50

Now filter to get only setosa records and we can see only 50 records so that’s worked.

selection_statement <- "Species == 'setosa'"

iris2 %>% 
    filter(rlang::eval_tidy(rlang::parse_expr(selection_statement)))
# # A tibble: 50 x 5
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#  1          5.1         3.5          1.4         0.2 setosa 
#  2          4.9         3            1.4         0.2 setosa 
#  3          4.7         3.2          1.3         0.2 setosa 
#  4          4.6         3.1          1.5         0.2 setosa 
#  5          5           3.6          1.4         0.2 setosa

I thought this method might fail if we create a variable called Species in the global environment but it still works completely fine which is great!

Species <- "abc"

iris2 %>% 
    filter(rlang::eval_tidy(rlang::parse_expr(selection_statement)))
# # A tibble: 50 x 5
#    Sepal.Length Sepal.Width Petal.Length Petal.Width Species
#           <dbl>       <dbl>        <dbl>       <dbl> <fct>  
#  1          5.1         3.5          1.4         0.2 setosa 
#  2          4.9         3            1.4         0.2 setosa 
#  3          4.7         3.2          1.3         0.2 setosa 
#  4          4.6         3.1          1.5         0.2 setosa 
#  5          5           3.6          1.4         0.2 setosa

So it makes me wonder why there is nothing much out there on this? My feeling is that something will make this method fail but what is it? Where does it fail? Let me know in the comments if you know please, thanks!

To leave a comment for the author, please follow the link and comment on their blog: R on Alan Yeung.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)