Computing evidence

November 28, 2010
By

The book Random effects and latent variable model selection, edited by David Dunson in 2008 as a Springer Lecture Note. contains several chapters dealing with evidence approximation in mixed effect models. (Incidentally, I would be interested in the story behind the  Lecture Note as I found no explanation in the backcover or in the preface.

Analyst First – SURF

November 28, 2010
By

This presentation is aimed at all those working in commercial and government analytics, irrespective of what tools they use, and also to those students intending on such a career. R and other open source tools play a powerful, unique and … Continue reading →

Random variable generation (Pt 1 of 3)

November 28, 2010
By
$Random variable generation (Pt 1 of 3)$

As I mentioned in a recent post, I’ve just received a copy of Advanced Markov Chain Monte Carlo Methods. Chapter 1.4 in the book (very quickly) covers random variable generation. Inverse CDF Method A standard algorithm for generating random numbers is the inverse cdf method. The continuous version of the algorithm is as follows: 1.

parser 0.0-12

November 28, 2010
By

I've pushed a new version of the parser package to CRAN. This is the first release that depends on Rcpp, which allowed me to reduce the code size and increase its maintainability. This also features a faster version of nlines, a function that r...

Rcpp 0.8.9

November 28, 2010
By

Rcpp 0.8.9 was pushed to CRAN recently. Apart from minor bug fixes, this release concentrates on modules, with lots of new features to expose C++ functions and classes through R reference classes. The Rcpp-modules vignette has all the details, a...

Computational efficiency of great-circle distance calculations in R

November 28, 2010
By

An obvious omission in my previous post on Great-circle distance calculations in R was a lack of discussion on the computational efficiency of the various methods, and in particular comparing different implementations of the same method. One of the comments … Continue reading →

LaTeX Typesetting – Basics

November 28, 2010
By

The LaTeX typesetting is used to create professional looking documents on a home computer. It may have a steeper learning curve than using a Word Processor, but this initial effort will often pay off reasonably quickly. The system is almost a necessity for anyone writing documents with a large amount of mathematics as most alternatives

Rcpp 0.8.9

November 28, 2010
By

A new release 0.8.9 of Rcpp is now available at CRAN and has just been uploaded to Debian. As always, sources are also available from my local directory here. This release comes a few weeks after the preceding 0.8.8 release and continues with a ...

Advanced Markov Chain Monte Carlo Methods (AMCMC)

November 27, 2010
By

I’ve just received my copy of Advanced Markov Chain Monte Carlo Methods, by Liang, Liu, & Carroll. Although my PhD didn’t really involve any Bayesian methodology (and my undergrad was devoid of any Bayesian influence), I’ve found that the sort of problems I’m now tackling in systems biology demand a Bayesian/MCMC approach. There are a

November 26, 2010
By

Conrad Sanderson released version 1.0.0 of Armadillo, his templated C++ library for linear algebra, earlier this week. So congratulations to Conrad on reaching 1.0.0! I folded his version 1.0.0 into a new release 0.2.10 of RcppArmadillo, our Rcpp-base...

Yet another inferno

November 26, 2010
By

Many from the R world will know The R Inferno. Abstract: If you are using R and you think you’re in hell, this is a map for you. A newly minted inferno is The 9 circles of scientific hell. Most amusing to me is Circle 4: p-value fishing, the punishment of which is brilliant. As … Continue reading...

Computational tools for Bayesian analysis

November 26, 2010
By

The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS...

Computational tools for Bayesian analysis

November 26, 2010
By

The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS...

Sweave Tutorial 1: Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions

November 26, 2010
By

In this post I present an example of using Sweave to prepare a PDF of formatted multiple choice questions. More broadly the example shows how to use Sweave to incorporate elements of a database into a formatted LaTeX document. It aims to be useful to anyone wanting to learn more about the almost magical powers of make, Sweave,...

Sweave Tutorial 1: Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions

November 26, 2010
By

In this post I present an example of using Sweave to prepare a PDF of formatted multiple choice questions.More broadly the example shows how to use Sweave to incorporate elements of a databaseinto a formatted LaTeX document.It aims to be useful to any...

Hierarchical Cluster Analysis

November 25, 2010
By

With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a ...

Hierarchical Cluster Analysis

November 25, 2010
By

With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a ...

Happy Thanksgiving!

November 25, 2010
By

It's holiday time here in the US, so we're taking a break at Revolutions. We'll be back with more R goodness on Monday, but in the meantime, think of the turkeys. R-chart: Don't be a Turkey

Benchmarking feature selection with Boruta and caret

November 25, 2010
By

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an important step in the analysis process. And since we often work on...

Benchmarking feature selection with Boruta and caret

November 25, 2010
By

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an...

Benchmarking feature selection with Boruta and caret

November 25, 2010
By

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an...

Random graphs with fixed numbers of neighbours

November 24, 2010
By

In connection with Le Monde puzzle #46, I eventually managed to write an R program that generates graphs with a given number n of nodes and a given number k of edges leaving each of those nodes. (My early attempt was simply too myopic to achieve any level of success when n was larger than

R preferred by Kaggle competitors

November 24, 2010
By

Kaggle, the predictive-analytics competition site, has analyzed the preferences of the 2,500 data scientists who participate in its competitions, and R was the most-preferred software of the competitors at 22.5%. The next-nearest alternative was Matlab, at 16%. On a related note, the premier of the Australian state of New South Wales has just launched a competition on Kaggle to...

R preferred by Kaggle competitors

November 24, 2010
By

Kaggle, the predictive-analytics competition site, has analyzed the preferences of the 2,500 data scientists who participate in its competitions, and R was the most-preferred software of the competitors at 22.5%. The next-nearest alternative was Matlab, at 16%. On a related note, the premier of the Australian state of New South Wales has just launched a competition on Kaggle to...

Life Is Short, Use Python

November 24, 2010
By

Life is short, use PythonI started to play with Python two weeks ago due to the limitation of R in terms of handling large data, then a friend of mine suggested me to try Python since I had to do data massage frequently, "Python is the best choice, trust me", he...

The joys of teaching R

November 23, 2010
By

Just read a funny but much to the point blog entry on the difficulties of teaching proper programming skills to first year students! I will certainly make use of the style file as grading 180 exams is indeed a recurrent nightmare… Filed under: R,...

Great-circle distance calculations in R

November 23, 2010
By

Recently I found myself needing to calculate the distance between a large number of longitude and latitude locations. As it turns out, because the earth is a three-dimensional object, you cannot simply pretend that you are in Flatland, albeit some … Continue reading →

Principal Component Analysis: Which variables contribute most to principal components ?

November 23, 2010
By

Principal component analysis (PCA) is a mathematical transformation of possibly(correlated) variables into a number of uncorrelated variables called principal components. The resulting components from this transformation is defined in such a way that t...