We're proud to announce the latest update to the enhanced, commercial-grade distribution of R, Revolution R Enterprise 5.0. With each new release, Revolution R Enterprise adds more capabilities to open-source R, to make R users more productive, to improve performance of R programs, to support Big Data analytics, and to provide servers and APIs for enterprise deployment.
- Distributed/Parallel Computing: Automatically distribute statistical analyses from a desktop across nodes of a cluster through Windows HPC server and distribute R function calls across nodes.
- Scalable Data Management: Increase flexibility in data analysis with new data import and cleaning/manipulation tools.
- Integration with Hadoop: Support MapReduce programming in R and integration with HDFS and HBASE with Cloudera Certified Technology.
- Expanded Scalable Analytics Functionality: Apply new big data statistics algorithms including principal components analysis, factor analysis, contingency table analysis and more.
- Enhanced R Productivity Environment: Create and build R packages with expanded support features.
- Enhanced RevoDeployR server: Add multiple compute nodes to support more users, batch execution of large analysis jobs, and LDAP enterprise security support.
- Upgraded Open Source R: Revolution R 5.0 includes the fully-patched R 2.13.2, which features a new byte-compiler to improve performance of user-written functions and packages.
We're particularly excited about the new capabilities to do parallel programming and statistical analysis on a HPC Server cluster. Here's a quick overview and an example of using a 5-node cluster to do a billion-row regression in less than a minute:
The detailed list of new features is below (after the jump), and you can find more about Revolution R Enterprise 5.0 at the link below. Existing subscribers will be notified with download instructions for the update in the next couple of days, and Revolution R Enterprise 5.0 is (as always) available free of charge to academic users.
Revolution Analytics: Revolution R Enterprise 5.0 overview
What’s New in Revolution R Enterprise 5.0
- Automatically distribute statistical analyses from your desktop across nodes of a cluster [Currently supported for Windows HPC Server]. Analyses include summary statistics, crosstabs, linear regression, logistic regression, covariance matrix computations for factor analysis and principal components, and k-means clustering. Binning computations for histograms are also distributed.
- Distribute R function calls, including data manipulation functions, across nodes. Easily distribute “embarassingly parallel” computations across nodes or cores of a [Microsoft HPC] cluster, or the cores of your desktop or laptop using the new rxExec function.
- Compute in parallel with foreach using RevoScaleR using the new doRSR backend.
- New RevoScaleR Distributed Computing Guide (choose Help/R Manuals (PDF) in the RPE).
Scalable data management : Data Import
- New versatile rxImport function for using external data with R (delimited and fixed-format text, SAS, SPSS, or ODBC). Bring smaller data sets directly into an R data frame; store larger data sets in the native .xdf file format, very efficient for storing and accessing large data sets. The rxImport function returns a data frame or an RxXdfData object representing the created .xdf file. Either can be used in subsequent data analysis functions.
- Two alternative modes of Delimited Text import, and two alternative modes of ODBC import – one supported on Linux
- Ability to keep or drop variables on import
- Ability to specify start row and number of rows of data to import
Scalable data management : Data Cleaning and Manipulation
- New versatile rxDataStep function allows you to perform data transformations on big data using the power and flexibility of the R language. Experiment with a small data frame, then apply the same code to a huge data set.
- Returns data frame or RxXdfData object representing an .xdf file that can be used in subsequent scalable analyses.
- Works with data frames or .xdf files (as input data or output), making it easy to convert from one type to another.
- Ability to “re-block” xdf files with a user-specified number of rows.
- Improved evaluation environments for user-defined transforms and transform functions, and new internal variable, .rxNumRows (containing the number of rows in the current block) for use within transformations.
- Big data merge with the new rxMerge function. Merge two large data files, or merge a smaller in-memory data set into a large data file.
- Improved performance for big data sort. New general rxSort function to work on data frames or .xdf file
- Ability to create and recode factors in .xdf files and data frames using new rxFactors function
- Split an .xdf file into multiple files by number of rows, blocks, or levels of a factor variable using new rxSplitXdf function.
- Support for additional data types in .xdf files: ordered factors and POSIXct, and improved support for Date data type.
- New functions rxGetVarInfo, rxGetInfo, and rxSetVarInfo work for both data frames and xdf files
- New examples in the RevoScaleR User’s Guide for big data data step and import.
Expanded scalable statistical functionality
- New functions utilizing output from rxCrossTabs objects:
- rxChiSquaredTest: Chi-squared Test
- rxFisherTest: Fisher's Exact Test
- rxKendallCor: Kendall's Tau Rank Correlation Coefficient
- rxPairwiseCrossTab: Apply a function 'FUN' to all pairwise combinations of the rows and columns of an xtabs object, stratifying by higher dimensions
- rxRiskRatio: Calculate the relative risk ratio on a two-by-two table
- rxOddsRatio: Calculate the relative odds ratio on a two-by-two table
- rxMultiTest: Collects a list of tests for variable independence into a table.
- Also a new rxResultsDF method for rxCrossTabs, rxSummary, and rxLinMod for extracting a data frame from results objects
- Improved performance for scalable analysis functions operating on data frames.
- Option in rxPredict and rxKmeans to write out model variables in addition to predictions/cluster number.
- Option in rxSummary to remove missing values by term.
- Option in rxLinMod and rxLogit to drop first or last factor levels, and ability to set starting parameter values in rxLogit.
- rxHistogram now supports logical data and frequency weights with continuous data, and has transforms and related arguments.
- New examples in the RevoScaleR User’s Guide for factor analysis and principal components analysis.
- Support for creating and building R packages:
- R Package Project type in RPE Solution Explorer to create the directory structure for a new R package.
- Create an Rd Help file template for a user-created function form the Solution Explorer by adding a new item and specifying the function name.
- Build an R package from the Solution Explorer.
- Support for Windows HPC Server:
- Access the HPC job scheduler directly from the Windows R Productivity Environment (RPE).
- View the status of pending jobs in the RPE Object Browser.
- Code snippets for distributed computing with HPC Server
- Option to load last-loaded solution on startup
- New projects now starts by default in release mode instead of debug mode