306 search results for "hadoop"

In case you missed it: November 2012 Roundup

December 12, 2012
By

In case you missed them, here are some articles from November of particular interest to R users. In the webinar "Real-Time Predictive Analytics with Big Data", I showed how R fits into a real-time production system. R package developer Yihui Xie shares his favorite software and hardware in an interview with The Setup. Hadley Wickham created a handy tutorial...

Read more »

Four years of the Revolutions Blog

December 10, 2012
By

Yesterday was the fourth anniversary of the Revolutions blog. Our first post was way back on December 9, 2008, and in the four years since we've been regularly posting about R, open source, statistics, big data, data science and other random things that happened to catch our eye. In fact, there have been 1488 posts published in the last...

Read more »

Please stop using Excel-like formats to exchange data

December 7, 2012
By
Please stop using Excel-like formats to exchange data

I know “officially” data scientists all always work in “big data” environments with data in a remote database, streaming store or key-value system. But in day to day work Excel files and Excel export files get used a lot and cause a disproportionate amount of pain. I would like to make a plea to myRelated posts:

Read more »

Importing Data Into R from Different Sources

December 6, 2012
By

I have found that I get data from many different sources.  These sources range from simple .csv files to more complex relational databases, to structure XML or JSON files.  I have compiled the different approaches that one can use to easily access these datasets. Local Column Delimited Files This is probably the most common and

Read more »

The surprisingly weak case for global warming

December 3, 2012
By
The surprisingly weak case for global warming

I welcome your thoughts on this post, but please read through to the end before commenting. Also, you’ll find the related code (in R) at the end. For those new to this blog, you may be taken aback (though hopefully not bored or shocked!) by how I expose my full process and reasoning. This is

Read more »

Google analytics data extraction in R

December 3, 2012
By
Google analytics data extraction in R

Unlike other posts on this blog this particular post is more focused on coding using R so audience with the developer mindset would like it more than pure business analysts. My goal is to describe an alternate method to use to extract the data from Google Analytics via API into R. I have been using

Read more »

bigglm on your big data set in open source R, it just works – similar as in SAS

bigglm on your big data set in open source R, it just works – similar as in SAS

In a recent post by Revolution Analytics (link & link) in which Revolution was benchmarking their closed source generalized linear model approach with SAS, Hadoop and open source R, they seemed to be pointing out that there is no 'easy' R open source solution which exists for building a poisson regression model on large datasets.  This post is about...

Read more »

Revolution Newsletter: November 2012

November 16, 2012
By

The most recent edition of the Revolution Newsletter is out. The news section is below, and you can read the full November edition (with highlights from this blog and community events) online. You can subscribe to the Revolution Newsletter to get it monthly via email. Now Available: Revolution R Enterprise 6.1 The latest release of Revolution Analytics' enterprise-ready data...

Read more »

Big Data ETL and Big Data Analysis

November 14, 2012
By
Big Data ETL and Big Data Analysis

I was at Strata New York 2012 last month. Great conference! Thanks O'Reilly media for assembling the industry leaders and running it well.I understand it was too crowded for some of my out-of-town friends. Stepping out to the streets of mid-town Manhat...

Read more »

Benchmarking bigglm

November 13, 2012
By

By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a five node cluster outperformed a Map Reduce/ Hadoop implementation...

Read more »