332 search results for "hadoop"

Google analytics data extraction in R

December 3, 2012
By
Google analytics data extraction in R

Unlike other posts on this blog this particular post is more focused on coding using R so audience with the developer mindset would like it more than pure business analysts. My goal is to describe an alternate method to use to extract the data from Google Analytics via API into R. I have been using

Read more »

bigglm on your big data set in open source R, it just works – similar as in SAS

bigglm on your big data set in open source R, it just works – similar as in SAS

In a recent post by Revolution Analytics (link & link) in which Revolution was benchmarking their closed source generalized linear model approach with SAS, Hadoop and open source R, they seemed to be pointing out that there is no 'easy' R open source solution which exists for building a poisson regression model on large datasets.    This post is about...

Read more »

Revolution Newsletter: November 2012

November 16, 2012
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full November edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Now Available: Revolution R Enterprise 6.1 The latest release of Revolution Analytics' enterprise-ready data...

Read more »

Big Data ETL and Big Data Analysis

November 14, 2012
By
Big Data ETL and Big Data Analysis

I was at Strata New York 2012 last month. Great conference! Thanks O'Reilly media for assembling the industry leaders and running it well.I understand it was too crowded for some of my out-of-town friends. Stepping out to the streets of mid-town Manhat...

Read more »

Benchmarking bigglm

November 13, 2012
By

By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a five node cluster outperformed a Map Reduce/ Hadoop implementation...

Read more »

What’s new in Revolution R Enterprise 6.1

November 8, 2012
By

We're pleased to announce that the latest update to Revolution R Enterprise is available today! Existing subscribers will soon receive an email with update instructions, and the free academic distribution will be updated later today. Version 6.1 adds a frequently-requested big-data statistical modeling algorithm, adds new connectivity option for Hadoop, improves performance, and provides new security and installation options...

Read more »

In case you missed it: October 2012 Roundup

November 7, 2012
By

In case you missed them, here are some articles from October of particular interest to R users. Sponsorships for local R user groups from Revolution Analytics are now open to applicants worldwide. During the landfall of Hurricane Sandy in the US, several R-based apps used public weather and social media data to document its impact, like this timeline of...

Read more »

R among TechCrunch’s 5 Trendy Open-Source Techs for Big Data

October 30, 2012
By

Tim Gasper (Product Manager at Big Data platform Infochimps) has an informative article at TechCrunch that provides an overview of five open-source technologies trending now for Big Data applications. They are: Storm and Kafka (for processing stream data) Drill and Dremel (for ad-hoc queries of big data) R (for data science with big data) Gremlin and Giraph (for graph...

Read more »

Quick notes from Strata NYC 2012

October 24, 2012
By

The O'Reilly Strata conferences are always great fun to attend, and this latest installment in New York City is no exception. This one is super-busy though; the conference has been sold out for weeks -- and not just marketing-sold-out, it's fire-department-sold out. It's non-stop conversations and presentations, and it's tough to move through the hallways in between. Nonetheless, I...

Read more »

Two Talks on Data Science, Big Data and R

October 23, 2012
By

On Thursday next week (November 1), I'll be giving a new webinar on the topic of Big Data, Data Science and R. Titled "The Rise of Data Science in the Age of Big Data Analytics: Why Data Distillation and Machine Learning Aren’t Enough", this is a provocative look at why data scientists cannot be replaced by technology, and why...

Read more »