# Monthly Archives: November 2010

## LaTeX Typesetting – Basics

November 28, 2010
The LaTeX typesetting is used to create professional looking documents on a home computer. It may have a steeper learning curve than using a Word Processor, but this initial effort will often pay off reasonably quickly. The system is almost a necessity for anyone writing documents with a large amount of mathematics as most alternatives

## Advanced Markov Chain Monte Carlo Methods (AMCMC)

November 27, 2010
I’ve just received my copy of Advanced Markov Chain Monte Carlo Methods, by Liang, Liu, & Carroll. Although my PhD didn’t really involve any Bayesian methodology (and my undergrad was devoid of any Bayesian influence), I’ve found that the sort of problems I’m now tackling in systems biology demand a Bayesian/MCMC approach. There are a

November 26, 2010
Conrad Sanderson released version 1.0.0 of Armadillo, his templated C++ library for linear algebra, earlier this week. So congratulations to Conrad on reaching 1.0.0! I folded his version 1.0.0 into a new release 0.2.10 of RcppArmadillo, our Rcpp-base...

## Yet another inferno

November 26, 2010
Many from the R world will know The R Inferno. Abstract: If you are using R and you think you’re in hell, this is a map for you. A newly minted inferno is The 9 circles of scientific hell. Most amusing to me is Circle 4: p-value fishing, the punishment of which is brilliant. As … Continue reading →

## Computational tools for Bayesian analysis

November 26, 2010
The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS...

## Sweave Tutorial 1: Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions

November 26, 2010
In this post I present an example of using Sweave to prepare a PDF of formatted multiple choice questions.More broadly the example shows how to use Sweave to incorporate elements of a databaseinto a formatted LaTeX document.It aims to be useful to any...

## Hierarchical Cluster Analysis

November 25, 2010
With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a ...

## Benchmarking feature selection with Boruta and caret

November 25, 2010
Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an important step in the analysis process. And since we often work on...

## Random graphs with fixed numbers of neighbours

November 24, 2010
In connection with Le Monde puzzle #46, I eventually managed to write an R program that generates graphs with a given number n of nodes and a given number k of edges leaving each of those nodes. (My early attempt was simply too myopic to achieve any level of success when n was larger than

## R preferred by Kaggle competitors

November 24, 2010
Kaggle, the predictive-analytics competition site, has analyzed the preferences of the 2,500 data scientists who participate in its competitions, and R was the most-preferred software of the competitors at 22.5%. The next-nearest alternative was Matlab, at 16%. On a related note, the premier of the Australian state of New South Wales has just launched a competition on Kaggle to...