Win-Vector LLC announces new “big data in R” tools
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Win-Vector LLC is proud to introduce two important new tool families (with documentation) in the 0.5.0
version of seplyr
(also now available on CRAN):
partition_mutate_se()
/partition_mutate_qt()
: these are query planners/optimizers that work overdplyr::mutate()
assignments. When using big-data systems through R (such as PostgreSQL or Apache Spark) these planners can make your code faster and sequence steps to avoid critical issues (the complementary problems of too long in-mutate dependence chains, of too many mutate steps, and incidental bugs; all explained in the linked tutorials).if_else_device()
: provides adplyr::mutate()
based simulation of per-row conditional blocks (including conditional assignment). This allows powerful imperative code (such as often seen in porting from SAS) to be directly and legibly translated into performantdplyr::mutate()
data flow code that works on Spark (via Sparklyr) and databases.
Image by Jeff Kubina from Columbia, Maryland – [1], CC BY-SA 2.0, Link
For “big data in R” users these two function families (plus the included support functions and examples) are simple, yet game changing. These tools were developed by Win-Vector LLC to fill gaps identified by Win-Vector and our partners when standing-up production scale R plus Apache Spark projects.
We are happy to share these tools as open source, and very interested in consulting with your teams on developing R/Spark solutions (including porting existing SAS code). For more information please reach out to Win-Vector.
To teams get started we are supplying the following initial documentation, discussion, and examples:
- Mutate Partitioner package vignette
if_else_device
reference- “Partition Mutate” article
- “Partitioning Mutate, Example 2” (includes
if_else_device
) article.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.