Analyst First – SURF

November 28, 2010
By
Analyst First – SURF

This presentation is aimed at all those working in commercial and government analytics, irrespective of what tools they use, and also to those students intending on such a career. R and other open source tools play a powerful, unique and … Continue reading →

Read more »

Random variable generation (Pt 1 of 3)

November 28, 2010
By
Random variable generation (Pt 1 of 3)

As I mentioned in a recent post, I’ve just received a copy of Advanced Markov Chain Monte Carlo Methods. Chapter 1.4 in the book (very quickly) covers random variable generation. Inverse CDF Method A standard algorithm for generating random numbers is the inverse cdf method. The continuous version of the algorithm is as follows: 1.

Read more »

parser 0.0-12

November 28, 2010
By

I've pushed a new version of the parser package to CRAN. This is the first release that depends on Rcpp, which allowed me to reduce the code size and increase its maintainability. This also features a faster version of nlines, a function that r...

Read more »

Rcpp 0.8.9

November 28, 2010
By
Rcpp 0.8.9

Rcpp 0.8.9 was pushed to CRAN recently. Apart from minor bug fixes, this release concentrates on modules, with lots of new features to expose C++ functions and classes through R reference classes. The Rcpp-modules vignette has all the details, a...

Read more »

Computational efficiency of great-circle distance calculations in R

November 28, 2010
By
Computational efficiency of great-circle distance calculations in R

An obvious omission in my previous post on Great-circle distance calculations in R was a lack of discussion on the computational efficiency of the various methods, and in particular comparing different implementations of the same method. One of the comments … Continue reading →

Read more »

LaTeX Typesetting – Basics

November 28, 2010
By
LaTeX Typesetting – Basics

The LaTeX typesetting is used to create professional looking documents on a home computer. It may have a steeper learning curve than using a Word Processor, but this initial effort will often pay off reasonably quickly. The system is almost a necessity for anyone writing documents with a large amount of mathematics as most alternatives

Read more »

Rcpp 0.8.9

November 28, 2010
By

A new release 0.8.9 of Rcpp is now available at CRAN and has just been uploaded to Debian. As always, sources are also available from my local directory here. This release comes a few weeks after the preceding 0.8.8 release and continues with a ...

Read more »

Advanced Markov Chain Monte Carlo Methods (AMCMC)

November 27, 2010
By
Advanced Markov Chain Monte Carlo Methods (AMCMC)

I’ve just received my copy of Advanced Markov Chain Monte Carlo Methods, by Liang, Liu, & Carroll. Although my PhD didn’t really involve any Bayesian methodology (and my undergrad was devoid of any Bayesian influence), I’ve found that the sort of problems I’m now tackling in systems biology demand a Bayesian/MCMC approach. There are a

Read more »

RcppArmadillo 0.2.10

November 26, 2010
By

Conrad Sanderson released version 1.0.0 of Armadillo, his templated C++ library for linear algebra, earlier this week. So congratulations to Conrad on reaching 1.0.0! I folded his version 1.0.0 into a new release 0.2.10 of RcppArmadillo, our Rcpp-base...

Read more »

Yet another inferno

November 26, 2010
By
Yet another inferno

Many from the R world will know The R Inferno. Abstract: If you are using R and you think you’re in hell, this is a map for you. A newly minted inferno is The 9 circles of scientific hell. Most amusing to me is Circle 4: p-value fishing, the punishment of which is brilliant. As … Continue reading...

Read more »

Computational tools for Bayesian analysis

November 26, 2010
By

The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS...

Read more »

Computational tools for Bayesian analysis

November 26, 2010
By

The increasing number of R-oriented Bayesian computational tools such as MCMCpack, MCMCglmm, DPpackage, R-INLA, spBayes, have made BUGS less and less crucial for day to day Bayesian computation. Honestly, I cannot figure out a single analysis that BUGS...

Read more »

Sweave Tutorial 1: Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions

November 26, 2010
By

In this post I present an example of using Sweave to prepare a PDF of formatted multiple choice questions. More broadly the example shows how to use Sweave to incorporate elements of a database into a formatted LaTeX document. It aims to be useful to anyone wanting to learn more about the almost magical powers of make, Sweave,...

Read more »

Sweave Tutorial 1: Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions

November 26, 2010
By
Sweave Tutorial 1: Using Sweave, R, and Make to Generate a PDF of Multiple Choice Questions

In this post I present an example of using Sweave to prepare a PDF of formatted multiple choice questions.More broadly the example shows how to use Sweave to incorporate elements of a databaseinto a formatted LaTeX document.It aims to be useful to any...

Read more »

Hierarchical Cluster Analysis

November 25, 2010
By
Hierarchical Cluster Analysis

With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a ...

Read more »

Hierarchical Cluster Analysis

November 25, 2010
By
Hierarchical Cluster Analysis

With the distance matrix found in previous tutorial, we can use various techniques of cluster analysis for relationship discovery. For example, in the data set mtcars, we can run the distance matrix with hclust, and plot a dendrogram that displays a ...

Read more »

Happy Thanksgiving!

November 25, 2010
By

It's holiday time here in the US, so we're taking a break at Revolutions. We'll be back with more R goodness on Monday, but in the meantime, think of the turkeys. R-chart: Don't be a Turkey

Read more »

Benchmarking feature selection with Boruta and caret

November 25, 2010
By
Benchmarking feature selection with Boruta and caret

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an important step in the analysis process. And since we often work on...

Read more »

Benchmarking feature selection with Boruta and caret

November 25, 2010
By
Benchmarking feature selection with Boruta and caret

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an...

Read more »

Benchmarking feature selection with Boruta and caret

November 25, 2010
By
Benchmarking feature selection with Boruta and caret

Feature selection is the data mining process of selecting the variables from our data set that may have an impact on the outcome we are considering. For commercial data mining, which is often characterised by having too many variables for model building, this is an...

Read more »

Random graphs with fixed numbers of neighbours

November 24, 2010
By
Random graphs with fixed numbers of neighbours

In connection with Le Monde puzzle #46, I eventually managed to write an R program that generates graphs with a given number n of nodes and a given number k of edges leaving each of those nodes. (My early attempt was simply too myopic to achieve any level of success when n was larger than

Read more »

R preferred by Kaggle competitors

November 24, 2010
By

Kaggle, the predictive-analytics competition site, has analyzed the preferences of the 2,500 data scientists who participate in its competitions, and R was the most-preferred software of the competitors at 22.5%. The next-nearest alternative was Matlab, at 16%. On a related note, the premier of the Australian state of New South Wales has just launched a competition on Kaggle to...

Read more »

R preferred by Kaggle competitors

November 24, 2010
By

Kaggle, the predictive-analytics competition site, has analyzed the preferences of the 2,500 data scientists who participate in its competitions, and R was the most-preferred software of the competitors at 22.5%. The next-nearest alternative was Matlab, at 16%. On a related note, the premier of the Australian state of New South Wales has just launched a competition on Kaggle to...

Read more »

Life Is Short, Use Python

November 24, 2010
By
Life Is Short, Use Python

Life is short, use PythonI started to play with Python two weeks ago due to the limitation of R in terms of handling large data, then a friend of mine suggested me to try Python since I had to do data massage frequently, "Python is the best choice, trust me", he...

Read more »

The joys of teaching R

November 23, 2010
By
The joys of teaching R

Just read a funny but much to the point blog entry on the difficulties of teaching proper programming skills to first year students! I will certainly make use of the style file as grading 180 exams is indeed a recurrent nightmare… Filed under: R,...

Read more »

Great-circle distance calculations in R

November 23, 2010
By
Great-circle distance calculations in R

Recently I found myself needing to calculate the distance between a large number of longitude and latitude locations. As it turns out, because the earth is a three-dimensional object, you cannot simply pretend that you are in Flatland, albeit some … Continue reading →

Read more »

Principal Component Analysis: Which variables contribute most to principal components ?

November 23, 2010
By

Principal component analysis (PCA) is a mathematical transformation of possibly(correlated) variables into a number of uncorrelated variables called principal components. The resulting components from this transformation is defined in such a way that t...

Read more »

Principal Component Analysis: Which variables contribute most to principal components ?

November 23, 2010
By

Principal component analysis (PCA) is a mathematical transformation of possibly(correlated) variables into a number of uncorrelated variables called principal components. The resulting components from this transformation is defined in such a way that t...

Read more »

Slides from first Utah.edu & R.P. RUG meeting

November 23, 2010
By

Here are the slides from the first University of Utah and Research Park R Users Group meeting. They discuss getting help and finding packages. R

Read more »