518 search results for "parallel"

Resampling data in Hadoop with RHadoop

February 27, 2013
By

On Revolution Analytics partner Cloudera's blog, Uri Laserson has posted an excellent guide to resampling from a large data set in Hadoop. Resampling is an important step in fitting ensemble models (including random forests and other bagging techniques), and Uri provides a step-by-step guide to implementing resampling methods using RHadoop. He provides the complete map-reduce code in the R...

Read more »

the BUGS Book [guest post]

February 24, 2013
By
the BUGS Book [guest post]

(My colleague Jean-Louis Fouley, now at I3M, Montpellier, kindly agreed to write a review on the BUGS book for CHANCE. Here is the review, en avant-première! Watch out, it is fairly long and exhaustive! References will be available in the published version. The additions of book covers with BUGS in the title and of the corresponding

Read more »

Large correlation in parallel

February 24, 2013
By
Large correlation in parallel

A little improvement to the bigcor function proposed on Rmazing to compute huge correlation matrix in R, I made the function work in parallel using all the CPU cores available on the machine. The code is here.Here is a benchmark of the 2 func...

Read more »

The Wisdom of Crowds – Clustering Using Evidence Accumulation Clustering (EAC)

February 24, 2013
By
The Wisdom of Crowds – Clustering Using Evidence Accumulation Clustering (EAC)

Today’s blog post is about a problem known by most of the people using cluster algorithms on datasets without given true labels (unsupervised learning). The challenge here is the “freedom of choice” over a broad range of different cluster algorithms and how to determine the right parameter values. The difficulty is the following: Every clustering algorithm and even...

Read more »

bigcor: Large correlation matrices in R

February 22, 2013
By
bigcor: Large correlation matrices in R

As I am working with large gene expression matrices (microarray data) in my job, it is sometimes important to look at the correlation in gene expression of different genes. It has been shown that by calculating the Pearson correlation between genes, one can identify (by high values, i.e. > 0.9) genes that share a common

Read more »

±∞

February 21, 2013
By
±∞

The Cauchy distribution (?dcauchy in R) nails a flashlight over the number line

http://upload.wikimedia.org/wikipedia/commons/thumb/9/93/Number-line.svg/1000px-Number-line.svg.png

and swings it at a constant speed from 9 o’clock down to 6 o’clock over to 3 o’clock. (Or the other direction, from 3→6→9.) Then counts Read more »

Progress bar in R

February 20, 2013
By

A decent percentage of working time in R, I spend looping over chromosomes, transcription factors or tissues, usually, using parallelization.To get the stuff to run simultaneously I use the foreach function from the doMC package, and for monitoring of ...

Read more »

Version 1.0 of multilevelPSA Available on CRAN

February 14, 2013
By
Version 1.0 of multilevelPSA Available on CRAN

Version 1.0 of multilevelPSA has been released to CRAN. The multilevelPSA package provides functions to estimate and visualize propensity score models with multilevel, or clustered, data. The graphics are an extension of PSAgraphics package by Helmreich and Pruzek. The example below will investigate the differences between private and public school internationally using the Programme of International Student Assessment...

Read more »

Quantile Autoregression in R

February 9, 2013
By
Quantile Autoregression in R

In the past, I wrote about robust regression. This is an important tool which handles outliers in the data. Roger Koenker is a substantial contributor in this area. His website is full of useful information and code so visit when … Continue reading

Read more »

MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

February 5, 2013
By
MCMSki IV, Jan. 6-8 (9?), 2014, Chamonix (news #3)

In case you have not been constantly tracking the changes on the MCMSki IV webpage, here are some news: the number of invited and accepted contributed sessions in the program had considerably increased, to the point of almost filling two parallel sessions for the whole duration of the meeting. This includes an exciting round-table on

Read more »