Vectorized Block ifelse in R

November 27, 2017
By

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Win-Vector LLC has been working on porting some significant large scale production systems from SAS to R.

From this experience we want to share how to simulate, in R with Apache Spark (via Sparklyr), a nifty SAS feature: the vectorized “block if(){}else{}” structure.

When porting code from one language to another you hope the expressive power and style of the languages are similar.

  • If the source language is too weak then the original code will be very long (and essentially over specified), meaning a direct transliteration will be unlikely to be efficient, as you are not using the higher order operators of the target language.
  • If the source language is too strong you will have operators that don’t have direct analogues in the target language.

SAS has some strong and powerful operators. One such is what I am calling “the vectorized block if(){}else{}“. From SAS documentation:

The subsetting IF statement causes the DATA step to continue processing only those raw data records or those observations from a SAS data set that meet the condition of the expression that is specified in the IF statement.

That is a really wonderful operator!

R has some available related operators: base::ifelse(), dplyr::if_else(), and dplyr::mutate_if(). However, none of these has the full expressive power of the SAS operator, which can per data row:

  • Conditionally choose where different assignments are made to (not just choose conditionally which values are taken).
  • Conditionally specify blocks of assignments that happen together.
  • Be efficiently nested and chained with other IF statements.

To help achieve such expressive power in R Win-Vector is introducing seplyr::if_else_device(). When combined with seplyr::partition_mutate_se() you get a good high performance simulation of the SAS power in R. These are now available in the open source R package seplyr.

For more information please reach out to us here at Win-Vector or try help(if_else_device).

Also, we will publicize more documentation and examples shortly (especially showing big data scale use with Apache Spark via Sparklyr).

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)