“The next big thing”, R, and Statistics in the cloud

April 14, 2010
By

(This article was first published on R-statistics blog » R, and kindly contributed to R-bloggers)

A friend just e-mailed me about a blog post by Dr. AnnMaria De Mars titled “The Next Big Thing”.

In it Dr. De Mars wrote (I allowed myself to emphasize some parts of the text):

Contrary to what some people seem to think, R is definitely not the next big thing, either. I am always surprised when people ask me why I think that, because to my mind it is obvious. [...]
for me personally and for most users, both individual and organizational, the much greater cost of software is the time it takes to install it, maintain it, learn it and document it. On that, R is an epic fail. It does NOT fit with the way the vast majority of people in the world use computers. The vast majority of people are NOT programmers. They are used to looking at things and clicking on things.

Here are my two cents on the subject:

First, I agree with Dr. De Mars that R (out of the box) is not very (non programmer) user friendly – there is (almost) no point and click capabilities.  And while there are several projects offering a GUI layer interface to R (a good list of them can be found here), still non of them is in the level of refinement of what softwares like SPSS, JMP or SAS offers to users today.

But is traditional “point and click” the next “big thing”?  My suspicion is that the answer is – no.
Neither does Dr. De Mars thinks so, since her predictions for the next big thing are “Data visualization” and “Analyzing enormous quantities of unstructured data”. Both of which R is offering quite powerful solutions to (assuming that you will go through the learning curve).

Dr. De Mars question is a fascinating one – what IS going to be the next big thing?

I think that the next BIG thing is (becoming to be) “Statistics in the Cloud“.  This intuition came from (among other things) my review of the “Future of Open Source” Survey (see “conclusion 3″).

In the near future, I believe, we will see more statisticians and data analysts tapping into the opportunities that cloud computing offers them.  Here are some examples of what I came a cross (or covered) lately in the topic of cloud computing and R:

  1. Easy online collecting of data (via google forms)
  2. High-performance computing - Running a statistical package software on the cloud for accessing a powerful computer or running stuff in parallel.  The former can be done through services like Amazon cloud, Elastic R, and lately R-Node combines the running of R and Protovis on a server.  The later I don’t have experience in, but understand there are various solutions in R (a known company in the field is, of course, REvolution computing)
  3. Online statistical analysis/visualization of data - Having a web interface to a statistical analysis.  One wonderful example of that is Jeroen Oom’s (beautiful) web interface to ggplot2.  Such projects offer “point and click” capabilities through the internet (/cloud)
  4. Online interactive visualization of data.  I came a cross three people offering to develop solutions for doing this with R in this year’s Google summer of code, I hope something will come out of it

All of these are well connected to the emerging trend of “web of data”/“linked data web” that some are talking about. For example, here is a good Ted talk by Tim Berners-Lee (the inventor of the World Wide Web). Talking about building a web for open, linked data that could do for numbers what the Web did for words, pictures, video: unlock our data and reframe the way we use it together.

The same plea is given by Hans rosling in his famous Ted talk showing GapMinder.
Although at he same time, some R users are saying – “You don’t have to bother linking the data. I’ll do with just the data, really, just release it…

In conclusion, I don’t know what capabilities other projects/products offer for doing statistics in the cloud.  But it is clear to me that the R community is (not surprisingly) bringing very diverse and innovative solutions to the world.

Is R the next big thing?  I don’t think so.  But I do think that some of the next big things will be built with R.
* * *
I would love to know your thought about Dr. De Mars post, and also about what the “next big thing” is going to be (and what role will R have in it).

To leave a comment for the author, please follow the link and comment on his blog: R-statistics blog » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , , , , ,

Comments are closed.