328 search results for "hadoop"

Resampling data in Hadoop with RHadoop

February 27, 2013
By

On Revolution Analytics partner Cloudera's blog, Uri Laserson has posted an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to implementing resampling methods using RHadoop. He provides the complete map-reduce code in the R...

Read more »

New ways to Hadoop with R

February 26, 2013
By

Today, there are two main ways to use Hadoop with R and big data: 1. Use the open-source rmr package to write map-reduce tasks in R (running within the Hadoop cluster - great for data distillation!) 2. Import data from Hadoop to a server running Revolution R Enterprise, via Hbase, ODBC (for high-performance Hadoop/SQL interfaces), or streaming data direct...

Read more »

Slides and replay of my “Using R with Hadoop” webinar now available #rstats #hadoop

January 25, 2013
By
Slides and replay of my “Using R with Hadoop” webinar now available #rstats #hadoop

I owe a big “thank you” to all of you who attended my webinar yesterday “Using R with Hadoop”. Revolution Analytics partnered with us at Think Big Analytics to produce the webinar, and I owe them thanks as well. For those of you who missed it, the slides and replay are now available from Revolution

Read more »

Video: Using R with Hadoop

January 25, 2013
By

If you weren't one of the almost 2000 people who signed up for yesterday's webinar "Using R with Hadoop", the replay and slides are now available. During the webinar, Jeffrey Breen (Principal at Think Big Academy) talked about extracting analytics from data in Hadoop and covered: How to use R and Hadoop Hadoop streaming Various R packages and RHadoop...

Read more »

Webinar Jan 24: Using R with Hadoop

January 10, 2013
By

In two weeks (on January 24), Think Big Analytics' Jeffrey Breen will present a new webinar on using R with Hadoop. Here's the webinar description: R and Hadoop are changing the way organizations manage and utilize big data. Think Big Analytics and Revolution Analytics are helping clients plan, build, test and implement innovative solutions based on the two technologies...

Read more »

Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X

December 29, 2012
By
Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X

MotivationI was inspired by Revolution's blog and step-by-step tutorial from Jeffrey Breen on the set up of a local virtual instance of Hadoop with R. However, this tutorial describes the implementation using VMware's application. One downside to using VMware is that it's not free. I know most of the people including me like to hear the words open-source and free,...

Read more »

Big Data Trees with Hadoop HDFS

December 4, 2012
By

Last month's release of Revolution R Enterprise 6.1 added the capability to fit decision and regresson trees on large data sets (using a new parallel external memory algorithm included in the RevoScaleR package). It also introduced the possibility of applying this and the other big-data statistical methods of RevoScaleR to data files distributed in in Hadoop's HDFS file system*,...

Read more »

Webinar Tomorrow: Big Data Trees and Hadoop Connection in Revolution R Enterprise 6.1

November 14, 2012
By

Tomorrow at 9AM Pacific, Revolution Analytics VP of Product Development Sue Ranney will introduce two key Big Data features of the new Revolution R Enterprise 6.1. Now you can train classification and regression trees on data sets of unlimited size, quickly and using the resources of multiple processors and clusters. (This white paper describes our implementation of tree models...

Read more »

Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

October 25, 2012
By
Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

At the Strata conference in New York today, Steve Yun (Principal Predictive Modeler at Allstate's Research and Planning Center) described the various ways he tackled the problem of fitting a generalized linear model to 150M records of insurance data. He evaluated several approaches: Proc GENMOD in SAS Installing a Hadoop cluster Using open-source R (both on the full data...

Read more »

Improving the integration between R and Hadoop: rmr 2.0 released

October 4, 2012
By

The RHadoop project, the open-source project supported by Revolution Analytics to integrate R and Hadoop, continues to evolve. Now available is version 2 of the rmr package, which makes it possible for R programmers to write map-reduce tasks in the R language, and have them run within the Hadoop cluster. This update is the "simplest and fastest rmr yet",...

Read more »