Big Data

Benchmarking bigglm

November 13, 2012 | 0 Comments

By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a ... [Read more...]

What’s new in Revolution R Enterprise 6.1

November 8, 2012 | 0 Comments

We're pleased to announce that the latest update to Revolution R Enterprise is available today! Existing subscribers will soon receive an email with update instructions, and the free academic distribution will be updated later today. Version 6.1 adds a frequently-requested big-data statistical modeling algorithm, adds new connectivity option for Hadoop, improves ... [Read more...]

Webinar Thursday: How R is used to optimize tractor production at John Deere

November 6, 2012 | 0 Comments

I just sat in on the rehearsal for Thursday's webinar by John Deere's Derek Hoffman, Order Fulfillment Forecasting at John Deere: How R Facilitates Creativity and Flexibility. Derek will give a spirited argument of why R is critical for the faming equipment manufacturer's operations: from forecasting demand for equipment, forecasting ... [Read more...]

Slides and replay for "The Rise of Data Science"

November 2, 2012 | 0 Comments

I had a great time presenting my new webinar yesterday, thanks to everyone who attended "The Rise of Data Science in the Age of Big Data Analytics" and especially those who submitted questions. Sorry I didn't have time to get to them all, but feel free to ask here in ... [Read more...]

R among TechCrunch’s 5 Trendy Open-Source Techs for Big Data

October 30, 2012 | 0 Comments

Tim Gasper (Product Manager at Big Data platform Infochimps) has an informative article at TechCrunch that provides an overview of five open-source technologies trending now for Big Data applications. They are: Storm and Kafka (for processing stream data) Drill and Dremel (for ad-hoc queries of big data) R (for data ... [Read more...]

Allstate compares SAS, Hadoop and R for Big-Data Insurance Models

October 25, 2012 | 0 Comments

At the Strata conference in New York today, Steve Yun (Principal Predictive Modeler at Allstate's Research and Planning Center) described the various ways he tackled the problem of fitting a generalized linear model to 150M records of insurance data. He evaluated several approaches: Proc GENMOD in SAS Installing a Hadoop ... [Read more...]

Quick notes from Strata NYC 2012

October 24, 2012 | 0 Comments

The O'Reilly Strata conferences are always great fun to attend, and this latest installment in New York City is no exception. This one is super-busy though; the conference has been sold out for weeks -- and not just marketing-sold-out, it's fire-department-sold out. It's non-stop conversations and presentations, and it's tough ... [Read more...]

Two Talks on Data Science, Big Data and R

October 23, 2012 | 0 Comments

On Thursday next week (November 1), I'll be giving a new webinar on the topic of Big Data, Data Science and R. Titled "The Rise of Data Science in the Age of Big Data Analytics: Why Data Distillation and Machine Learning Aren’t Enough", this is a provocative look at why ... [Read more...]

Nine lightning talks on R

October 12, 2012 | 0 Comments

At Tuesday's Bay Area R User Group meetup, nine speakers gave five-minute talks on various aspects of R. Revolution Analytics' Luba Gloukhov was one of the presenters, and also provides the summary of the talks below. Links to the slides are included where available for you to check out. Ariel ... [Read more...]

Improving the integration between R and Hadoop: rmr 2.0 released

October 4, 2012 | 0 Comments

The RHadoop project, the open-source project supported by Revolution Analytics to integrate R and Hadoop, continues to evolve. Now available is version 2 of the rmr package, which makes it possible for R programmers to write map-reduce tasks in the R language, and have them run within the Hadoop cluster. This ... [Read more...]

Tips on accessing data from various sources with R

October 3, 2012 | 0 Comments

Jeffrey Breen (the man behind the Twitter airline sentiment analysis example) recently posted a collection of slides with some great tips for accessing data from R. "Tapping the Data Deluge" includes information on: Using the XLConnect package to read data from Excel spreadsheets Using the foreign package to read SPSS, ... [Read more...]

Using R in production: industry experts share their experiences

September 26, 2012 | 0 Comments

I had a great time yesterday moderating the "R in Action" panel discussion at the DataWeek conference in San Francisco. Each of the panelists represented a company that is actively using R and/or Revolution R Enterprise. Here (from memory, since I couldn't take notes) are some the things they ... [Read more...]

Population health management with RevoScaleR

September 10, 2012 | 0 Comments

This guest post is by Douglas McNair MD PhD, Engineering Fellow & President, Cerner Math Inc. -- ed. RevoScaleR scaling big-data modeling performance for real-time health data analysis at Cerner The size of data sets is increasing much more rapidly than the speed of cores, of RAM, and of disk drives. ... [Read more...]

Getting Started with R and Hadoop

August 20, 2012 | 0 Comments

Last week's meeting of the Chicago area Hadoop User Group (a joint meeting the Chicago R User Group, and sponsored by Revolution Analytics) focused on crunching Big Data with R and Hadoop. Jeffrey Breen, president of Atmosphere Research Group, frequently deals with large data sets in his airline consulting work, ... [Read more...]

Ryan Rosario on Parallel programming in R

August 17, 2012 | 0 Comments

Earlier this year data scientist Ryan Rosario gave a talk on parellel computing with R to the Los Angeles R User Group, and he recently made the slides from the talk available online. They're a great resource for anyone looking to make use of multi-processor systems a Hadoop based architechure ... [Read more...]

Cheat sheet for prediction and classification models in R

August 9, 2012 | 0 Comments

Ricky Ho has created a reference a 6-page PDF reference card on Big Data Machine Learning, with examples implemented in the R language. (A free registration to DZone Refcardz is required to download the PDF.) The examples cover: Predictive modeling overview (how to set up test and training sets in ... [Read more...]

Big vectors coming to R

July 26, 2012 | 0 Comments

R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16... [Read more...]

Faster R in Hadoop: rmr 1.3 now available

July 23, 2012 | 0 Comments

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. ... [Read more...]
1 2 3 5

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)