Monthly Archives: December 2012

An R wish list for 2013

December 29, 2012
By
An R wish list for 2013

First go and read An R wish list for 2012. None of the wishes came through in 2012. Fix the R website? No, it is the same this year. In fact, it is the same as in 2005. Easy to find help? Sorry, next year. Consistency and sane defaults? Coming soon to a theater near

Read more »

UEFA, is that it ?

December 29, 2012
By
UEFA, is that it ?

Following my previous post, a few more things. As mentioned by Frédéric, it is – indeed – possible to compute the probability of all pairs. More precisely, all pairs are not as likely to occur: some teams can play against (almost) eveyone, while others cannot. From the previous table, it is possible to compute probability that the last team plays...

Read more »

Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X

December 29, 2012
By
Integration of R, RStudio and Hadoop in a VirtualBox Cloudera Demo VM on Mac OS X

MotivationI was inspired by Revolution's blog and step-by-step tutorial from Jeffrey Breen on the set up of a local virtual instance of Hadoop with R. However, this tutorial describes the implementation using VMware's application. One downside to using VMware is that it's not free. I know most of the people including me like to hear the words open-source and free,...

Read more »

Row-wise summary curves in faceted ggplot2 figures

December 29, 2012
By
Row-wise summary curves in faceted ggplot2 figures

I really enjoy reading the Junk Charts blog.  A recent post made me wonder how easy it would be to add summary curves for small-multiple type plots, assuming the “small multiples” to summarize were the X component of a ggplot2::facet_grid(Y ~ X) … Continue reading →

Read more »

RcppExamples 0.1.5 and RcppClassicExamples 0.1.0

December 29, 2012
By

The recent releases of Rcpp 0.10.2 and RcppClassic 0.9.3 had one more repercussion. On that dreaded OS, the linker no longer wanted to instantiate a symbol present in both packages; seems to me that the linker in the other two OSs is a little smarter...

Read more »

High-Dimensional Microarray Data Sets in R for Machine Learning

December 29, 2012
By

Much of my research in machine learning is aimed at small-sample, high-dimensional bioinformatics data sets. For instance, here is a paper of mine on the topic. A large number of papers proposing new machine-learning methods that target high-dimensional data use the same two data sets and consider few others. These data sets are the 1) Alon colon cancer...

Read more »

Speed skating 10 km

December 29, 2012
By
Speed skating 10 km

It is winter which makes it time for one of Netherlands beloved sports: speed skating. Speed skating is done over various distances, but for me, the most beautiful is the 10 km. The top men do this in about 13 minutes. In this post I try to u...

Read more »

Men who stare at needles

December 29, 2012
By

Buffon's needle problem is a question first posed in the 18th century by Georges-Louis Leclerc, Comte de Buffon:What is the probability that a needle thrown at a lined sheet of paper will cross a line?This problem can be used to estimate π. If we set the nail size and the line distance = 1, the estimator can be calculated...

Read more »

STL transform + remove_copy for subsetting

December 29, 2012
By

We have seen the use of the STL transform functions in the posts STL transform and Transforming a matrix. We use the same logic in conjuction with a logical (ie boolean) vector in order subset an initial vector. #include <Rcpp.h> using namespace...

Read more »

Surprising Performance of data.table in Data Aggregation

December 28, 2012
By
Surprising Performance of data.table in Data Aggregation

data.table (http://datatable.r-forge.r-project.org/) inherits from data.frame and provides functionality in fast subset, fast grouping, and fast joins. In previous posts, it is shown that the shortest CPU time to aggregate a data.frame with 13,444 rows and 14 columns for 10 times is 0.236 seconds with summarize() in Hmisc package. However, after the conversion from data.frame to

Read more »