## Really useful bits of code that are missing from R

January 10, 2011
There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere. Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data. geomean <- function(x, na.rm = FALSE, trim = 0, ...) { exp(mean(log(x, ...), na.rm = na.rm,

## R interface to Google Chart Tools

January 10, 2011
Hans Rosling eat your heart out! It is now possible to interface R statistics software to Google’s Gapminder inspired Chart Tools. The plots below were produced using the googleVis R package and three datasets from the Gapminder website. The first shows the relationship between income, life expectancy and population for 20 countries with the highest ...

## EmEditor R code macro – Almost interactive R development for Emeditor

January 10, 2011
Get the new macro now hosted on githubEdit 18th Jan 2011: The below text refers to the old version of the macro and is no longer relevant, a new post will  describe the new macro, and it is also documented on the github site.As a follow ...

## Using R for Introductory Statistics, Chapter 4, Model Formulae

January 10, 2011
Several R functions take model formulae as parameters. Model formulae are symbolic expressions. They define a relationship between variables rather than an arithmetic expression to be evaluated immediately. Model formulae are defined with the tilde ope...

## Batting and Bowling performance in Ashes 2010 – 2011

January 9, 2011
English cricket is strong once again. And it is great to see that (after all they invented the gentleman's game).In a sharp contrast to previous tours of Australia, England outplayed Australia on their home ground in the recently concluded Ashes 2...

## From one extreme (0) to another (1): challenge failed, but who cares…

January 9, 2011
Just after arriving in Montréal, at the beginning of September, I discussed statistics of my blog, and said that it might be possible - or likely - that by new year's Eve, over a million page would have been viewed on my blog (from Google's count...

## R and Google Visualization API

January 8, 2011
R interfaces with the powerful Google Visualization API with the package googleVis (see here). It's relatively easy to convert your graphics in R to interactive graphics to post on a web browser. And the graphics are quite nice, as seen below in a simple graph of some of my data collected from this summer on seed predation to...

January 8, 2011
Just before Xmas, Conrad Sanderson released version 1.1.0 of Armadillo, his templated C++ library for linear algebra. Which I only noticed this week, so here comes version 0.2.11 of RcppArmadillo, our Rcpp-based integration into R. The only other ...

## The Automatic Millionaire & Amortization

January 8, 2011
Dan Byrne of Vanderbilt University gave me a book back in October titled The Automatic Millionaire by David Bach. The book is an easy read and full of sound advice that I intend to take. Bach espouses a plan for retirement built on the principles of “paying yourself first” (i.e. before taxation, 401(k), 403(b)), making

## An example of linear discriminant analysis

January 8, 2011
The following example was shown in an advanced statistics seminar held in tel aviv. The material for the presentation comes from C.M Bishop’s book : Pattern Recognition and Machine Learning by Springer(2006). One way of separating 2 categories using linear sub spaces of the input space (e.g. planes for 3D inputs, lines for 2D inputs, [&hellip

## Building a fact-based world view

January 7, 2011
Gapminder is an independent foundation based in Stockholm, Sweden. Its mission is “to debunk devastating myths about the world by offering free access to a fact-based world view“. They provide free online tools, data (more than 400 datasets freely available!) and videos “to better understand the changing world“. The initial development of Gapminder was the

## Arrogance sampling

January 7, 2011
$Arrogance sampling$

A new posting on arXiv by Benedict Escoto on a simulation method for approximating normalising constants (i.e. evidence) with an eye-catching name! Here is the abstract This paper describes a method for estimating the marginal likelihood or Bayes factors of Bayesian models using non-parametric importance sampling (“arrogance sampling”). This method can also be used to

## Heatmap tables

January 7, 2011
I blogged earlier (http://socialdatablog.com/what-is-wrong-with-this-graph) about the well-known risks of implying a continuous data scale in a graph where there isn't one. I just produced this alternative in the form of a heatmap table, i.e. a heatmap in which the numbers themselves are also shown. Perhaps not quite as intuitive but less misleading. It uses the

## survival curves for Leonid

January 7, 2011
Leonid asked me to do a quick survival analysis of two different types of mouse (m430 and m210) with surgically implanted tumours (or something like that). The data was in the wrong format but after transforming it looked like this: In my opini...

## Boris Bikes/Barclays Cycle Hire Average Journey Times

January 6, 2011
The visualisation above shows the average relative duration of Boris Bikers’ weekday journeys over a 4 month period at hourly intervals. For each time step the average journey time (in seconds) from each docking station has been calculated.This information is interesting because it shows the preference for short journeys around the City of London, whilst ...

## a survey on ABC

January 6, 2011
With Jean-Michel Marin, Pierre Pudlo and Robin Ryder, we just completed a survey on the ABC methodology. It is now both arXived and submitted to Statistics and Computing. Rather interestingly, our first draft was written in Jean-Michel’s office in Montpelier by collating the ‘Og posts surveying new ABC papers! (Interestingly because this means that my

## formatR update (0.1-6)

January 6, 2011
A new version of the formatR package is available on CRAN now (binary packages are still on the way). There are three major updates: the inline comments will also be preserved in most cases (in earlier versions, only single lines of comments are preserved) tidy.source() gained a new argument 'text' to accept a character vector

## web content anlayzer

January 6, 2011
Just developed a small crawler to check my online content at binfalse.de in terms of W3C validity and the availability of external links. Here is the code and some statistics...

## Gapminder

January 6, 2011
As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points. Fast Tube by Casper The gapminder software

## New R User Group in Kansas City

January 6, 2011
There's a new R User Group based in Kansas City, Kansas. Abraham Mathew just launched the group's website, and is looking for R users in the area to kick things off: This group was started to bring together R users in the Kansas City area to exchange knowledge and provide guidance to new R users. We hope to have...

## Some market predictions

January 6, 2011
We look at a few forecasts for the year 2011 that we’ve run across, and compare them with the prediction distributions presented in Revised market prediction distributions. FTSE 100 There is a “range forecast” on an Interactive Investor page of 5350 to 6565.  It isn’t clear (to me at least) what this means, but I … Continue reading...

## sab-R-metrics: Basics of Vectors and Data Calling

January 6, 2011
Wednesday, I began a new series called "sab-R-metrics". My hope is that it reduces the frustration that goes along with learning a new programming language and enhances others' ability to perform their own analysis in baseball or other sports. However, these tutorials will hopefully allow you to use these skills in other areas as well. ...

## Ecological networks from abundance distributions

January 6, 2011
Another grad student and I tried recently to make a contribution to our understanding of the relationship between ecological network structure (e.g., nestedness) and community structure (e.g., evenness)......Alas, I had no luck making new insights. How...

## Graph gallery in R

January 6, 2011
R is sometime criticized for producing graphs not as elaborated as Matlab ones, or other softwares’. Here is a link to a graph gallery by Romain François to “enhance your data visualization with R”. The corresponding R code is given. Might be useful for ENSAE students for ‘statap’ projects. Below are four examples. The maps