333 search results for "hadoop"

The Marriage of Hadoop and R: Revolution Analytics at Hadoop World

November 11, 2011
By
The Marriage of Hadoop and R: Revolution Analytics at Hadoop World

Revolution Analytics CTO David Champagne visited Hadoop World 2011 this week, and delivered a presentation on "The Powerful Marriage of R and Hadoop" to a standing-room-only crowd of R and Hadoop enthusiasts. I've included David's slides below: The talk also generated praise on Twitter, for example: David was also interviewed by The Cube during the conference. In the video...

Read more »

"Anyone planning to work with Big Data ought to learn Hadoop and R"

October 25, 2011
By

Dan Woods at Forbes interviewed LinkedIn's Daniel Tunkelang about the rise of data science and on building data science teams. When asked how students today should prepare themselves to be data scientists, Tunkelang gives some good advice: When we built the data science team at LinkedIn a few years ago, we looked for raw talent, assuming that smart people...

Read more »

Implementing K-means clustering for Hadoop in R and Java

October 14, 2011
By
Implementing K-means clustering for Hadoop in R and Java

At the Bay Area R User Group meeting this week, Antonio Piccolboni gave an overview of the design goals and implementation of the RHadoop Project packages that connect Hadoop and R: rhdfs, rhbase and rmr: (The image above was captured from Antionio's slides.) The most revealing part of the talk for me was the comparison of implementing the K-means...

Read more »

Slides and replay from "R and Hadoop" webinar

September 21, 2011
By
Slides and replay from "R and Hadoop" webinar

So ... there's clearly a lot of interest in integrating R and Hadoop. Today's webinar was a record-setter for Revolution Analytics, with more than 1000 people signing up to learn how to access Hadoop data from R with the packages from the open-source RHadoop project. If you didn't catch the live webinar, don't fret: the slides and replay are...

Read more »

How to program MapReduce jobs in Hadoop with R

September 13, 2011
By

MapReduce is a powerful programming framework for efficiently processing very large amounts of data stored in the Hadoop distributed filesystem. But while several programming frameworks for Hadoop exist, few are tuned to the needs of data analysts who typically work in the R environment as opposed to general-purpose languages like Java. That's why the dev team at Revolution Analytics...

Read more »

Webinar: Leveraging R in Hadoop Environments

September 6, 2011
By
Webinar: Leveraging R in Hadoop Environments

On Wednesday September 21, Revolution Analytics' CTO David Champagne will give a live webinar introducing three new open-source packages for R and Hadoop, which make it possible to work with Hadoop data in R, and bring in-database R analytics to Hadoop. Here are the details: Date: Wednesday, September 21st Time: 10:00AM - 10:30AM Pacific Time Presenter: David Champagne, Chief...

Read more »

Using R for Map-Reduce applications in Hadoop

May 4, 2011
By

Data Scientist Antonio Piccolboni recently published this comparison of the various language and interfaces available for programming Big Data analysis tasks in the map-reduce framework. The interfaces he reviewed included: Java Hadoop (mature and efficient, but verbose and difficult to program) Cascading (brings an SQL-like flavor to Java programming with Hadoop) Pipes/C++ (a C++ interface to programming on Hadoop)...

Read more »

RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

February 16, 2011
By
RHIPE: An Interface Between Hadoop and R for Large and Complex Data Analysis

RHIPE: An Interface Between Hadoop and R Presented by Saptarshi Guha About the Video: I filmed the event using LectureMaker’s live event recording technique. One special feature I add to my R video recordings is the addition of my own R source code … Continue reading →

Read more »

Run R in parallel on a Hadoop cluster with AWS in 15 minutes

January 10, 2011
By

If you're looking to apply massively parallel resources to an R problem, one of the most time-consuming aspects of the problem might not be the computations themselves, but the task of setting up the cluster in the first place. You can use Amazon Web Services to set up the cluster in the cloud, but even that take some time,...

Read more »

Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

January 10, 2011
By
Abusing Amazon’s Elastic MapReduce Hadoop service… easily, from R

JD Long's experimental segue package makes it easy to use Amazon's Elastic MapReduce service to fire up a Hadoop cluster and use it for non-Big Data, computationally-intensive tasks. The package provides a cluster-aware version of lapply() which "just works".

Read more »