Vectorized Block ifelse in R

[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R.

From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure.

When porting code from one language to another you hope the expressive power and style of the languages are similar.

  • If the source language is too weak then the original code will be very long (and essentially over specified), meaning a direct transliteration will be unlikely to be efficient, as you are not using the higher order operators of the target language.
  • If the source language is too strong you will have operators that don’t have direct analogues in the target language.

SAS has some strong and powerful operators. One such is what I am calling “the vectorized block if(){}else{}“. From SAS documentation:

The subsetting IF statement causes the DATA step to continue processing only those raw data records or those observations from a SAS data set that meet the condition of the expression that is specified in the IF statement.

That is a really wonderful operator!

R has some available related operators: base::ifelse(), dplyr::if_else(), and dplyr::mutate_if(). However, none of these has the full expressive power of the SAS operator, which can per data row:

  • Conditionally choose where different assignments are made to (not just choose conditionally which values are taken).
  • Conditionally specify blocks of assignments that happen together.
  • Be efficiently nested and chained with other IF statements.

To help achieve such expressive power in R Win-Vector is introducing seplyr::if_else_device(). When combined with seplyr::partition_mutate_se() you get a good high performance simulation of the SAS power in R. These are now available in the open source R package seplyr.

For more information please reach out to us here at Win-Vector or try help(if_else_device).

Also, we will publicize more documentation and examples shortly (especially showing big data scale use with Apache Spark via Sparklyr).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)