Big Data

Benchmarking bigglm

November 13, 2012 | Joseph Rickert

By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a ... [Read more...]

What’s new in Revolution R Enterprise 6.1

November 8, 2012 | David Smith

We're pleased to announce that the latest update to Revolution R Enterprise is available today! Existing subscribers will soon receive an email with update instructions, and the free academic distribution will be updated later today. Version 6.1 adds a frequently-requested big-data statistical modeling algorithm, adds new connectivity option for Hadoop, improves ... [Read more...]

Slides and replay for "The Rise of Data Science"

November 2, 2012 | David Smith

I had a great time presenting my new webinar yesterday, thanks to everyone who attended "The Rise of Data Science in the Age of Big Data Analytics" and especially those who submitted questions. Sorry I didn't have time to get to them all, but feel free to ask here in ... [Read more...]

Quick notes from Strata NYC 2012

October 24, 2012 | David Smith

The O'Reilly Strata conferences are always great fun to attend, and this latest installment in New York City is no exception. This one is super-busy though; the conference has been sold out for weeks -- and not just marketing-sold-out, it's fire-department-sold out. It's non-stop conversations and presentations, and it's tough ... [Read more...]

Two Talks on Data Science, Big Data and R

October 23, 2012 | David Smith

On Thursday next week (November 1), I'll be giving a new webinar on the topic of Big Data, Data Science and R. Titled "The Rise of Data Science in the Age of Big Data Analytics: Why Data Distillation and Machine Learning Aren’t Enough", this is a provocative look at why ... [Read more...]

Nine lightning talks on R

October 12, 2012 | David Smith

At Tuesday's Bay Area R User Group meetup, nine speakers gave five-minute talks on various aspects of R. Revolution Analytics' Luba Gloukhov was one of the presenters, and also provides the summary of the talks below. Links to the slides are included where available for you to check out. Ariel ... [Read more...]

Tips on accessing data from various sources with R

October 3, 2012 | David Smith

Jeffrey Breen (the man behind the Twitter airline sentiment analysis example) recently posted a collection of slides with some great tips for accessing data from R. "Tapping the Data Deluge" includes information on: Using the XLConnect package to read data from Excel spreadsheets Using the foreign package to read SPSS, ... [Read more...]

Population health management with RevoScaleR

September 10, 2012 | David Smith

This guest post is by Douglas McNair MD PhD, Engineering Fellow & President, Cerner Math Inc. -- ed. RevoScaleR scaling big-data modeling performance for real-time health data analysis at Cerner The size of data sets is increasing much more rapidly than the speed of cores, of RAM, and of disk drives. ... [Read more...]

Getting Started with R and Hadoop

August 20, 2012 | David Smith

Last week's meeting of the Chicago area Hadoop User Group (a joint meeting the Chicago R User Group, and sponsored by Revolution Analytics) focused on crunching Big Data with R and Hadoop. Jeffrey Breen, president of Atmosphere Research Group, frequently deals with large data sets in his airline consulting work, ... [Read more...]

Ryan Rosario on Parallel programming in R

August 17, 2012 | David Smith

Earlier this year data scientist Ryan Rosario gave a talk on parellel computing with R to the Los Angeles R User Group, and he recently made the slides from the talk available online. They're a great resource for anyone looking to make use of multi-processor systems a Hadoop based architechure ... [Read more...]

Big vectors coming to R

July 26, 2012 | David Smith

R has been available as a 64-bit application since it's earliest days. But the internal representation of R's fundamental data type — the vector — has long been subject to a 32-bit limitation: the maximum number of elements is capped at 2^31 (or just over 2.1 billion) elements. Now, at 8 bytes per element that's 16... [Read more...]

Faster R in Hadoop: rmr 1.3 now available

July 23, 2012 | David Smith

The RHadoop project continues the Big Data integration of R and Hadoop, with a new update to its rmr package. Version 1.3 of rmr improves the performance of map-reduce jobs for Hadoop written in R. New features include: An optional vectorized API for efficient R programming when dealing with small records. ... [Read more...]
1 2 3 5

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)