PivotalR Improves the Scalability and Performance of In-Database Analytics

June 18, 2013
By

(This article was first published on Pivotal P.O.V. » R, and kindly contributed to R-bloggers)

image by George Montana Harkin, Noun Project

One of the greatest challenges while working with big datasets concerns the need to move information out of storage for analysis. This process can increase the chance of error and often forces practitioners to work with partial or incomplete samplings of the data. One of the key features of the Pivotal HD platform and HAWQ is the ability to directly work with data within a Hadoop cluster, no movement necessary. To this end, the recent announcement of PivotalR 0.1 extends the platform’s capabilities, allowing users of the statistical programming language R to perform in-database analytics without leaving the command line.

PivotalR improves the scalability and performance of in-database analytics by letting users explore and manipulate information in the database using the R interface. PivotalR handles the necessary SQL translation, and computation is done within the database. The result is faster queries and modeling, without requiring the user to move data or work with only a portion of all the available information.

Practitioners familiar with R syntax will be able to perform predictive analytics and interact with MADlib analytics function calls using the language that they are already familiar with. On the roadmap for PivotalR includes support for R visualizations using intelligent sampling, Chorus integration, and support for all existing MADlib algorithms.

The PivotalR 0.1 package is available for download on Github, along with documentation, example code, and the quick start guide. You can also learn more about PivotalR from this video walkthrough by Hai Qian of Pivotal’s Predictive Analytics Team and Woo Jae Jung of the Data Science Team.

To leave a comment for the author, please follow the link and comment on their blog: Pivotal P.O.V. » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)