It is Sunday, it's raining and I have a few hours to spend before I am invited for lunch at my parents place. Hence, I thought I'd use the time to produce another post. It has been a while since … Continue reading →

Follows is the announcement today from Peter Dalgaard, for the R Core Team: The build system rolled up R-2.15.3.tar.gz (codename “Security Blanket”) at 9:00 this morning. This is intended to be the final round-up release of the 2.15 series, and in fact of the entire 2.x.y series which started 2004-10-04. The list below details the changes in this release. You can get...

On Revolution Analytics partner Cloudera's blog, Uri Laserson has posted an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to implementing resampling methods using RHadoop. He provides the complete map-reduce code in the R...

(My colleague Jean-Louis Fouley, now at I3M, Montpellier, kindly agreed to write a review on the BUGS book for CHANCE. Here is the review, en avant-première! Watch out, it is fairly long and exhaustive! References will be available in the published version. The additions of book covers with BUGS in the title and of the corresponding

Today’s blog post is about a problem known by most of the people using cluster algorithms on datasets without given true labels (unsupervised learning). The challenge here is the “freedom of choice” over a broad range of different cluster algorithms and how to determine the right parameter values. The difficulty is the following: Every clustering algorithm and even...

As I am working with large gene expression matrices (microarray data) in my job, it is sometimes important to look at the correlation in gene expression of different genes. It has been shown that by calculating the Pearson correlation between genes, one can identify (by high values, i.e. > 0.9) genes that share a common

The Cauchy distribution (?dcauchy in R) nails a flashlight over the number line and swings it at a constant speed from 9 o’clock down to 6 o’clock over to 3 o’clock. (Or the other direction, from 3→6→9.) Then counts Read more »

A decent percentage of working time in R, I spend looping over chromosomes, transcription factors or tissues, usually, using parallelization.To get the stuff to run simultaneously I use the foreach function from the doMC package, and for monitoring of ...

Version 1.0 of multilevelPSA has been released to CRAN. The multilevelPSA package provides functions to estimate and visualize propensity score models with multilevel, or clustered, data. The graphics are an extension of PSAgraphics package by Helmreich and Pruzek. The example below will investigate the differences between private and public school internationally using the Programme of International Student Assessment...