# Monthly Archives: September 2013

## Regression on variables, or on categories?

September 30, 2013
By

I admit it, the title sounds weird. The problem I want to address this evening is related to the use of the stepwise procedure on a regression model, and to discuss the use of categorical variables (and possible misinterpreations). Consider the following dataset > db = read.table("http://freakonometrics.free.fr/db2.txt",header=TRUE,sep=";") First, let us change the reference in our categorical variable  (just to...

## Which is the best "Flyover" state?

September 30, 2013
By

If you were to hop into your personal aircraft, and plotted a straight line course taking off in one state and landing in the SAME state, how many other states might you fly over? On other words, what's the best state for "flyovers" of other states? Todd Schnieder from the Rapgenius engineering team answered that question using the R...

## How heavy is the Siberut macaque? A Bayesian phylogenetic approach

September 30, 2013
By

Among-species comparisons can include phylogenetic information to account for non-independence arising from shared evolutionary history. Often, phylogenetic topologies and branch lengths are not known exactly, but are estimated with uncertainty. This uncertainty can be accounted for using methods recently described in a neat paper called Bayesian models for comparative analysis integrating phylogenetic uncertainty by Villemereuil et...

September 30, 2013
By

This post is a follow up from my latest Things I Forget post on reading in shapefiles.  That post assumed that you already had access to all the relevant files (e.g. .shp, .shx, .prj, .dbf, etc.).  A task that I routinely need to do is locate shapefiles on a website, grab those files, and read

## ROC curves and classification

September 30, 2013
By
$\{0,1\}$

To get back to a question asked after the last course (still on non-life insurance), I will spend some time to discuss ROC curve construction, and interpretation. Consider the dataset we’ve been using last week, > db = read.table("http://freakonometrics.free.fr/db.txt",header=TRUE,sep=";") > attach(db) The first step is to get a model. For instance, a logistic regression, where some factors were merged...

## eoda’s R-academy celebrates its third anniversary and is extending the extensive training program for R

September 30, 2013
By

The free statistical programming language R becomes more and more popular, even in German speaking areas. This happens for various reasons. Besides from its performance, its quality and its open source character, R scores with its various possibilities for integration. Beyond that R offers a multitude of possibilities in the field of multivariate analysis methods.

September 30, 2013
By

Along with the Rcpp 0.10.5 release yesterday, a new minor release 0.3.920.1 of RcppArmadillo came out. It is based on Conrad's Armadillo 3.920.0 plus a minor fix, and uses some of the new Rcpp features. Both package is now on CRAN and also in Debian....

## Rcpp 0.10.5

September 29, 2013
By

A new version of Rcpp is now on the CRAN network for GNU R; binaries for Debian have been uploaded as well. Once more, this release brings a large number of exciting changes to Rcpp. Some concern usability, some bring new features, some increase pe...

## Classification with O-PLS-DA

September 29, 2013
By

Partial least squares (PLS) is a versatile algorithm which can be used to predict either continuous or discrete/categorical variables. Classification with PLS is termed PLS-DA, where the DA stands for discriminant analysis.  The PLS-DA algorithm has many favorable properties for dealing with multivariate data; one of the most important of which is how variable collinearity is