Win-Vector LLC announces new “big data in R” tools

November 29, 2017

(This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers)

Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0 version of seplyr (also now available on CRAN):

  • partition_mutate_se() / partition_mutate_qt(): these are query planners/optimizers that work over dplyr::mutate() assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster and sequence steps to avoid critical issues (the complementary problems of too long in-mutate dependence chains, of too many mutate steps, and incidental bugs; all explained in the linked tutorials).
  • if_else_device(): provides a dplyr::mutate() based simulation of per-row conditional blocks (including conditional assignment). This allows powerful imperative code (such as often seen in porting from SAS) to be directly and legibly translated into performant dplyr::mutate() data flow code that works on Spark (via Sparklyr) and databases.

Blacksmith working

Image by Jeff Kubina from Columbia, Maryland – [1], CC BY-SA 2.0, Link

For “big data in R” users these two function families (plus the included support functions and examples) are simple, yet game changing. These tools were developed by Win-Vector LLC to fill gaps identified by Win-Vector and our partners when standing-up production scale R plus Apache Spark projects.

We are happy to share these tools as open source, and very interested in consulting with your teams on developing R/Spark solutions (including porting existing SAS code). For more information please reach out to Win-Vector.

To teams get started we are supplying the following initial documentation, discussion, and examples:

To leave a comment for the author, please follow the link and comment on their blog: R – Win-Vector Blog. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)