328 search results for "hadoop"

Getting Started with R and Hadoop

August 20, 2012
By
Getting Started with R and Hadoop

Last week's meeting of the Chicago area Hadoop User Group (a joint meeting the Chicago R User Group, and sponsored by Revolution Analytics) focused on crunching Big Data with R and Hadoop. Jeffrey Breen, president of Atmosphere Research Group, frequently deals with large data sets in his airline consulting work, and R is his "go-to tool for anything data-related"....

Read more »

Faster R in Hadoop: rmr 1.3 now available

July 23, 2012
By

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. Fast C implementations for serialization and deserialization from...

Read more »

Data distillation with Hadoop and R

June 11, 2012
By
Data distillation with Hadoop and R

We're definitely in the age of Big Data: today, there are many more sources of data readily available to us to analyze than there were even a couple of years ago. But what about extracting useful information from novel data streams that are often noisy and minutely transactional ... aye, there's the rub. One of the great things about...

Read more »

Facebook-class social network analysis with R and Hadoop

May 25, 2012
By
Facebook-class social network analysis with R and Hadoop

In computing, social networks are traditionally represented as graphs: a connection of nodes (people), pairs of which may be connected by edges (friend relationships). Visually, the social networks can then be represented like this: Social network analysis often amounts to calculating the statistics on a graph like this: the number of edges (friends) connected to a particular node (person),...

Read more »

Big Data Analytics with R and Hadoop

May 3, 2012
By

The open-source RHadoop project makes it easier to extract data from Hadoop for analysis with R, and to run R within the nodes of the Hadoop cluster -- essentially, to transform Hadoop into a massively-parallel statistical computing cluster based on R. In yesterday's webinar (the replay of which is embedded below), Data scientist and RHadoop project lead Antonio Piccolboni...

Read more »

Introduction to Oracle R Connector for Hadoop

April 23, 2012
By

MapReduce, the heart of Hadoop, is a programming framework that enables massive scalability across servers using data stored in the Hadoop Distributed File System (HDFS). The Oracle R Connector for Hadoop (ORCH) provides access to a Hadoop cluster from R, enabling manipulation of HDFS-resident data and the execution of MapReduce jobs. Conceptutally, MapReduce is similar...

Read more »

R and Hadoop: Step-by-step tutorials

March 14, 2012
By
R and Hadoop: Step-by-step tutorials

At the recent Big Data Workshop held by the Boston Predictive Analytics group, airline analyst and R user Jeffrey Breen gave a step-by-step guide to setting up an R and Hadoop infrastructure. Firstly, as a local virtual instance of Hadoop with R, using VMWare and Cloudera's Hadoop Demo VM. (This is a great way to get familiar with Hadoop.)...

Read more »

Slides from today’s Big Data Step-by-Step Tutorials: Infrastructure series and Intro to R+Hadoop with RHadoop’s rmr

March 10, 2012
By
Slides from today’s Big Data Step-by-Step Tutorials: Infrastructure series and Intro to R+Hadoop with RHadoop’s rmr

Slides from the Boston Predictive Analytics Big Data Workshop tutorials: Big Data Step-by-Step: Infrastructure 1/3: Local VM Big Data Step-by-Step: Infrastructure 2/3: Running R and RStudio on EC2 Big Data Step-by-Step: Infrastructure 3/3: Taking it to the cloud... easily... with Whirr Big Data Step-by-Step: Using R & Hadoop (with RHadoop's rmr package)

Read more »

RHadoop updated: improved performance and more control

February 27, 2012
By

Revolution Analytics' open-source RHadoop project, which provides integration between R and Hadoop, has been updated with the release of version 1.2 of the "rmr" package. New in this version: support for binary I/O formats, which improves on the text-only interfact by allowing use of faster and more space-efficient data formats like R's native serialization format. This version also improves...

Read more »

RHadoop update: new tools for Hadoop map-reduce tasks in R

December 13, 2011
By

The open-source RHadoop project to integrate R and Hadoop continues apace, with a new version of the rmr package released this week. Changes in this version improve performance when storing and retrieving R objects from Hadoop with a native serialization process, support for equijoins (a MapReduce-style merge) and some new higher-level R functions to make writing map-reduce tasks simpler....

Read more »