Posts Tagged ‘ Big Data ’

Nine lightning talks on R

October 12, 2012
By

At Tuesday's Bay Area R User Group meetup, nine speakers gave five-minute talks on various aspects of R. Revolution Analytics' Luba Gloukhov was one of the presenters, and also provides the summary of the talks below. Links to the slides are included where available for you to check out. Ariel Faigon: Chrestomathy with R Ariel walked us through his...

Read more »

Improving the integration between R and Hadoop: rmr 2.0 released

October 4, 2012
By

The RHadoop project, the open-source project supported by Revolution Analytics to integrate R and Hadoop, continues to evolve. Now available is version 2 of the rmr package, which makes it possible for R programmers to write map-reduce tasks in the R language, and have them run within the Hadoop cluster. This update is the "simplest and fastest rmr yet",...

Read more »

Tips on accessing data from various sources with R

October 3, 2012
By

Jeffrey Breen (the man behind the Twitter airline sentiment analysis example) recently posted a collection of slides with some great tips for accessing data from R. "Tapping the Data Deluge" includes information on: Using the XLConnect package to read data from Excel spreadsheets Using the foreign package to read SPSS, SAS, Stata and dBase data files Using SQL queries...

Read more »

Using R in production: industry experts share their experiences

September 26, 2012
By

I had a great time yesterday moderating the "R in Action" panel discussion at the DataWeek conference in San Francisco. Each of the panelists represented a company that is actively using R and/or Revolution R Enterprise. Here (from memory, since I couldn't take notes) are some the things they shared: Jesse Bridgewater from eBay talked about how R is...

Read more »

Population health management with RevoScaleR

September 10, 2012
By

This guest post is by Douglas McNair MD PhD, Engineering Fellow & President, Cerner Math Inc. -- ed. RevoScaleR scaling big-data modeling performance for real-time health data analysis at Cerner The size of data sets is increasing much more rapidly than the speed of cores, of RAM, and of disk drives. This is particularly true of electronic health records...

Read more »

Getting Started with R and Hadoop

August 20, 2012
By
Getting Started with R and Hadoop

Last week's meeting of the Chicago area Hadoop User Group (a joint meeting the Chicago R User Group, and sponsored by Revolution Analytics) focused on crunching Big Data with R and Hadoop. Jeffrey Breen, president of Atmosphere Research Group, frequently deals with large data sets in his airline consulting work, and R is his "go-to tool for anything data-related"....

Read more »

Ryan Rosario on Parallel programming in R

August 17, 2012
By

Earlier this year data scientist Ryan Rosario gave a talk on parellel computing with R to the Los Angeles R User Group, and he recently made the slides from the talk available online. They're a great resource for anyone looking to make use of multi-processor systems a Hadoop based architechure to speed computations with big data. Ryan's talk was...

Read more »

Cheat sheet for prediction and classification models in R

August 9, 2012
By
Cheat sheet for prediction and classification models in R

Ricky Ho has created a reference a 6-page PDF reference card on Big Data Machine Learning, with examples implemented in the R language. (A free registration to DZone Refcardz is required to download the PDF.) The examples cover: Predictive modeling overview (how to set up test and training sets in R) Linear regression (using lm) Logistic regression (using glm)...

Read more »

Big vectors coming to R

July 26, 2012
By

R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16Gb of data, so...

Read more »

Faster R in Hadoop: rmr 1.3 now available

July 23, 2012
By

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. Fast C implementations for serialization and deserialization from...

Read more »