A competition to recommend “relevant” R packages – and the future of R

October 9, 2010

(This article was first published on R-statistics blog, and kindly contributed to R-bloggers)

Update: the competition was href="http://www.johnmyleswhite.com/notebook/2010/10/10/r-recommendation-contest-launches-on-kaggle/">just launched. /> * * *

What is the competition about?

href="http://www.drewconway.com/zia/?p=2415">Drew Conway and href="http://www.johnmyleswhite.com/notebook/2010/10/07/build-a-recommendation-system-for-r-packages/">John Myles Whyte have collected data from (52) R users about the packages they have installed. The data is now href="http://github.com/johnmyleswhite/r_recommendation_system">available on github for download and the contest will be run on the href="http://kaggle.com/About-Us/how-it-works">kaggle platform.

For more details, href="http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/">head over to dataists.

And for fun, here is the dependency graph for R packages they have assembled so far:

style="float:right; width:247px"> style="float:right; font-size:10px; width:247px; border:1px">A graphical visualization of packages’ “suggestion” relationships. Affectionately referred to as the href="http://en.wikipedia.org/wiki/Flying_Spaghetti_Monster" >R Flying Spaghetti Monster. More info below.

A tiny bit more on R bloggers virality

id="more-566"> /> Since I started getting involved in the href="http://www.r-bloggers.com/">R bloggers community, I can recall two major discussion that have attracted more then two bloggers writing about them.

The first one was people in the R community arguing against Dr. AnnMaria De Mars post “The Next Big Thing”, where she wrote that “R is an epic fail.” (my response to it then was the post “ href="http://www.r-statistics.com/2010/04/r-the-next-big-thing-and-statistics-in-the-cloud/">“The next big thing”, R, and Statistics in the cloud“) /> The second one was tackling the question “Is R “that bad” that it should be rewritten from scratch?”. Many responses went to the post by Ross Ihaka who was arguing for the need to rewrite R from scratch (a very wide spectrum of replies to that can be viewed on the href="http://stackoverflow.com/questions/3706990/is-r-that-bad-that-it-should-be-rewritten-from-scratch">stackoverflow discussion I started on the topic.)

And in the past few days I noticed a href="http://blog.revolutionanalytics.com/2010/10/kaggle-competition.html">starting href="http://www.drewconway.com/zia/?p=2415"> of a href="http://www.stat.columbia.edu/~cook/movabletype/archives/2010/10/contest_for_dev.html">cascade href="http://www.johnmyleswhite.com/notebook/2010/10/07/build-a-recommendation-system-for-r-packages/">of posts, all promoting the post at “ href="http://www.dataists.com/2010/10/using-data-tools-to-find-data-tools-the-yo-dawg-of-data-hacking/">dataists“.

This leads me to three simple statements: /> 1) I think it is beautiful that the R community has advocates that defend R’s role in the future of statistics /> 2) I think it is important that the R community has so many (smart) people (beyond the amazing R core team) who reflects on how R is doing, and of the challenges that the R language and environment will face in the future. /> 3) I think it is a fascinating thing that the R community is a community of researchers who have the skills to research themselves. Each community of a discipline can use it’s skill on itself – psychologists may psychoanalyze themselves, WordPress bloggers may write about WordPress, and R users can plan studies and analyse data about themselves – this potential is only beginning to be untapped – and I am excited to see where it might lead in the years to come.

To leave a comment for the author, please follow the link and comment on his blog: R-statistics blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags: , , , , ,

Comments are closed.

Top 3 Posts from the past 2 days

Top 9 articles of the week

  1. In-depth introduction to machine learning in 15 hours of expert videos
  2. Scatterplots
  3. Installing R packages
  4. Using apply, sapply, lapply in R
  5. Review: Machine Learning with R Cookbook
  6. R is now important enought to have a paid for PR make-over
  7. Hygge at UseR! 2015, Aalborg
  8. A Tutorial on Writing Simulation Apps in Shiny
  9. The Greek thing