Blog Archives

Data Science Education gets personal

March 14, 2013
By

by Joseph B. Rickert It is difficult to imagine that there is anyone on the planet with an internet connection and a desire to learn something new who has not at least looked into taking a massive open online course (MOOC). Last Fall, in an 11/4/12 article, the New York Time declared the Year of the MOOC and quoted...

Read more »

A Review of the R Graphics Cookbook

February 11, 2013
By
A Review of the R Graphics Cookbook

A common criticism of R, especially from data scientists who are new to R but proficient in multiple programming languages, is that R is “quirky” and annoying because there is almost always more than one way to do simple things. I usually counter that they are trying to say that R is “flexible” and “rich”, but by the time...

Read more »

Benchmarking bigglm

November 13, 2012
By

By Joseph Rickert In a recent blog post, David Smith reported on a talk that Steve Yun and I gave at STRATA in NYC about building and benchmarking Poisson GLM models on various platforms. The results presented showed that the rxGlm function from Revolution Analytics’ RevoScaleR package running on a five node cluster outperformed a Map Reduce/ Hadoop implementation...

Read more »

Simulating the Birthday Problem with data derived probabilities

June 6, 2012
By
Simulating the Birthday Problem with data derived probabilities

You've probably heard of the Birthday Paradox: it only takes a small gathering of people before it's quite likely that two of them share the same birthday. You can solve the problem analytically or with simulation, but usually in either case simplifying assumptions are made (no-one born on February 29, for example). Joe Rickert uses Revolution R Enterprise 6...

Read more »

Simple tools for building a recommendation engine

April 19, 2012
By

By Joseph Rickert Revolution’s resident economist, Saar Golde, is very fond of saying that “90% of what you might from a recommendation engine can be achieved with simple techniques”. To illustrate this point (without doing a lot of work), we downloaded the million row movie dataset from www.grouplens.org with the idea of just taking the first obvious exploratory step:...

Read more »

Coefplot: New Package for Plotting Model Coefficients

January 3, 2012
By
Coefplot: New Package for Plotting Model Coefficients

By Joseph Rickert Even to the practiced eye, looking at coefficients in R model summaries can be tedious. And, capturing information about the significance of coefficients from scores or maybe even hundreds of models in a way that makes writing the final report a bit easier is a time consuming and thankless task. Of course, once you know what...

Read more »

Review of ‘R in Action’ by Robert I. Kabacoff

December 20, 2011
By
Review of ‘R in Action’ by Robert I. Kabacoff

By Joseph Rickert Yesterday, the cosmic randomizer placed me next to a newly minter lawyer in a crowed Los Gatos coffee shop. In three minutes of conversation I learned that that the fellow was interested in corporate law, was about to take a job that would give him a seat in the great VC/start-up game and that he had...

Read more »

The Bay Area R User Group Meeting on Data Mining with R

December 16, 2011
By

By Joseph Rickert Put up a poster that says something like “Data Mining with R” anywhere in the Bay Area and you will surely draw a crowd. But it was still a bit of a surprise that the monthly meeting of the Bay Area R User’s group was so well attended. At one point there were 160 people on...

Read more »

Review of "The Art of R Programming" by Norman Matloff

November 29, 2011
By

By Joseph Rickert Anyone seeking to learn R faces two major challenges: (1) learning how to swim in the sea of information: R packages, books, websites, blog posts, message boards etc. that threatens to drown a newbie and (2) and coming to grips with the structure, syntax and features of the language itself. Having some idea of what one...

Read more »

ACM Data Mining Camp 2011: Report

October 18, 2011
By

(By Joseph Rickert.) In San Jose topics like big data, map reduce, predictive models, mobile analytics and crowdsourcing draw a crowd even on a Saturday. So it turned out that the ACM data Mining Camp and "un-conference" was a very "happening" way to spend a Saturday. Over 500 people attended the event at the Ebay "Town Hall" on North...

Read more »