Blog Archives

ACM Data Mining Camp 2011: Report

October 18, 2011
By

(By Joseph Rickert.) In San Jose topics like big data, map reduce, predictive models, mobile analytics and crowdsourcing draw a crowd even on a Saturday. So it turned out that the ACM data Mining Camp and "un-conference" was a very "happening" way to spend a Saturday. Over 500 people attended the event at the Ebay "Town Hall" on North...

Read more »

Where to find data to use with R

October 11, 2011
By

(Contributing blogger Joe Rickert has put together a fantastic list of data sources suitable for use with R. If you're looking for data to use in the Applications of R Contest -- entries close October 31 -- this is a great resource for you -- Ed.) Hardly a day goes by without someone or something reminding me that we...

Read more »

A Work of Art: Efron on Bayesian Inference

October 6, 2011
By

(Contributing blogger Joseph Rickert reports from the Stanford University Statistics Seminar series - ed.) Stanford University is very gracious about letting the general public attend many university events. Yesterday, it caught my eye that Bradley Efron was going to speak on Bayesian inference and the parametric bootstrap at the weekly Statistics seminar. So, since the free shuttle that goes...

Read more »

K-Means Clustering on Big Data

June 7, 2011
By
K-Means Clustering on Big Data

In this post Joseph Rickert demonstrates how to build a classification model on a large data set with the RevoScaleR package. A script file for use with Revolution R Enterprise to recreate the analysis below is at the end of the post, and can also be downloaded here -- ed. The k-means (Lloyd) algorithm, an intuitive way to explore...

Read more »

New functions for linear model inference in Revolution R Enterprise 4.3

April 26, 2011
By

The latest release of Revolution R Enterprise shows how Revolution Analytics’ package for big data, RevoScaleR, is continuing add new capabilities for Big Data statistics. RevoScaleR removes the limits on the size of the data that can be processed in R through the use of the highly efficient .Xdf binary file format. Xdf stores data by rows within columns...

Read more »

Baseball, T-tests and statistical surprises

March 31, 2011
By
Baseball, T-tests and statistical surprises

Are MLB players better hitters now than they were 20 years ago? Revolution Analytics' Joseph Rickert uses R to take a look at the data, and offers an instructive lesson in checking your assumptions for statistical tests in the process -- Ed. Data are everywhere – but, even for simple things, I still seem to spend a too much...

Read more »

Predicting R models with PMML: Revolution R Enterprise and ADAPA

March 24, 2011
By

The recently announced Revolution Analytics / Zementis partnership goes a long way towards demonstrating how R fits into big-league production environments. A frequent complaint against R is that although R is fine prototyping tool it is not able to handle production environments. Well, that’s just not true. In fact, it is straightforward to build a model in R, translate...

Read more »

ACM Data Mining Camp

November 16, 2010
By
ACM Data Mining Camp

By guest blogger Joseph Rickert. I was very happy to be a part of the ACM Data Mining camp held last Saturday (November 13th) at eBay. It was a big day for discussing hot topics in data mining, Mahout, parallel SVMs etc, and also a pretty big day for R. Because Revolution Analytics was a sponsor for the camp,...

Read more »

Making sense of MapReduce

September 24, 2010
By

From guest blogger Joseph Rickert. Last night I went to hear Ken Krugler of Bixolabs talk about Hadoop at the monthly meeting of the Software Developers Forum. Maybe because Ken is an unusually lucid speaker, or maybe because I just reached some sort of cumulative tipping point through the prep work of all those patient people who have tried...

Read more »

Why Learn R? It’s the language of Statistics

June 24, 2010
By

In the Introduction to his book “R for SAS and SPSS Users” (Springer 2009) Robert Muenchen offers ten reasons for learning R if you already know SAS or SPSS. All ten reasons say something important about R. However, his fourth reason: “R’s language is more powerful than SAS or SPSS. R developers write most of their analytic methods using...

Read more »