Really useful bits of code that are missing from R

January 10, 2011
By
Really useful bits of code that are missing from R

There are some pieces of code that are so simple and obvious that they really ought to be included in base R somewhere. Geometric mean and standard deviation – a staple for anyone who deals with lognormally distributed data. geomean <- function(x, na.rm = FALSE, trim = 0, ...) { exp(mean(log(x, ...), na.rm = na.rm,

Read more »

R interface to Google Chart Tools

January 10, 2011
By

Hans Rosling eat your heart out! It is now possible to interface R statistics software to Google’s Gapminder inspired Chart Tools. The plots below were produced using the googleVis R package and three datasets from the Gapminder website. The first shows the relationship between income, life expectancy and population for 20 countries with the highest ...

Read more »

EmEditor R code macro – Almost interactive R development for Emeditor

January 10, 2011
By

Get the new macro now hosted on githubEdit 18th Jan 2011: The below text refers to the old version of the macro and is no longer relevant, a new post will  describe the new macro, and it is also documented on the github site.As a follow ...

Read more »

Using R for Introductory Statistics, Chapter 4, Model Formulae

January 10, 2011
By
Using R for Introductory Statistics, Chapter 4, Model Formulae

Several R functions take model formulae as parameters. Model formulae are symbolic expressions. They define a relationship between variables rather than an arithmetic expression to be evaluated immediately. Model formulae are defined with the tilde ope...

Read more »

Using R for Introductory Statistics, Chapter 4, Model Formulae

January 10, 2011
By
Using R for Introductory Statistics, Chapter 4, Model Formulae

Several R functions take model formulae as parameters. Model formulae are symbolic expressions. They define a relationship between variables rather than an arithmetic expression to be evaluated immediately. Model formulae are defined with the tilde ope...

Read more »

Batting and Bowling performance in Ashes 2010 – 2011

January 9, 2011
By
Batting and Bowling performance in Ashes 2010 – 2011

English cricket is strong once again. And it is great to see that (after all they invented the gentleman's game).In a sharp contrast to previous tours of Australia, England outplayed Australia on their home ground in the recently concluded Ashes 2...

Read more »

Batting and Bowling performance in Ashes 2010 – 2011

January 9, 2011
By
Batting and Bowling performance in Ashes 2010 – 2011

English cricket is strong once again. And it is great to see that (after all they invented the gentleman's game).In a sharp contrast to previous tours of Australia, England outplayed Australia on their home ground in the recently concluded Ashes 2...

Read more »

From one extreme (0) to another (1): challenge failed, but who cares…

January 9, 2011
By
From one extreme (0) to another (1): challenge failed, but who cares…

Just after arriving in Montréal, at the beginning of September, I discussed statistics of my blog, and said that it might be possible - or likely - that by new year's Eve, over a million page would have been viewed on my blog (from Google's count...

Read more »

R and Google Visualization API

January 8, 2011
By
R and Google Visualization API

R interfaces with the powerful Google Visualization API with the package googleVis (see here). It's relatively easy to convert your graphics in R to interactive graphics to post on a web browser. And the graphics are quite nice, as seen below in a simple graph of some of my data collected from this summer on seed predation to...

Read more »

RcppArmadillo 0.2.11

January 8, 2011
By

Just before Xmas, Conrad Sanderson released version 1.1.0 of Armadillo, his templated C++ library for linear algebra. Which I only noticed this week, so here comes version 0.2.11 of RcppArmadillo, our Rcpp-based integration into R. The only other ...

Read more »

The Automatic Millionaire & Amortization

January 8, 2011
By
The Automatic Millionaire & Amortization

Dan Byrne of Vanderbilt University gave me a book back in October titled The Automatic Millionaire by David Bach. The book is an easy read and full of sound advice that I intend to take. Bach espouses a plan for retirement built on the principles of “paying yourself first” (i.e. before taxation, 401(k), 403(b)), making

Read more »

An example of linear discriminant analysis

January 8, 2011
By
An example of linear discriminant analysis

The following example was shown in an advanced statistics seminar held in tel aviv. The material for the presentation comes from C.M Bishop’s book : Pattern Recognition and Machine Learning by Springer(2006). One way of separating 2 categories using linear sub spaces of the input space (e.g. planes for 3D inputs, lines for 2D inputs, [&hellip

Read more »

Building a fact-based world view

January 7, 2011
By
Building a fact-based world view

Gapminder is an independent foundation based in Stockholm, Sweden. Its mission is “to debunk devastating myths about the world by offering free access to a fact-based world view“. They provide free online tools, data (more than 400 datasets freely available!) and videos “to better understand the changing world“. The initial development of Gapminder was the

Read more »

Arrogance sampling

January 7, 2011
By
Arrogance sampling

A new posting on arXiv by Benedict Escoto on a simulation method for approximating normalising constants (i.e. evidence) with an eye-catching name! Here is the abstract This paper describes a method for estimating the marginal likelihood or Bayes factors of Bayesian models using non-parametric importance sampling (“arrogance sampling”). This method can also be used to

Read more »

Heatmap tables

January 7, 2011
By
Heatmap tables

I blogged earlier (http://socialdatablog.com/what-is-wrong-with-this-graph) about the well-known risks of implying a continuous data scale in a graph where there isn't one. I just produced this alternative in the form of a heatmap table, i.e. a heatmap in which the numbers themselves are also shown. Perhaps not quite as intuitive but less misleading. It uses the

Read more »

Heatmap tables

January 7, 2011
By
Heatmap tables

I blogged earlier (http://socialdatablog.com/what-is-wrong-with-this-graph) about the well-known risks of implying a continuous data scale in a graph where there isn't one. I just produced this alternative in the form of a heatmap table...

Read more »

survival curves for Leonid

January 7, 2011
By
survival curves for Leonid

Leonid asked me to do a quick survival analysis of two different types of mouse (m430 and m210) with surgically implanted tumours (or something like that). The data was in the wrong format but after transforming it looked like this: In my opini...

Read more »

Boris Bikes/Barclays Cycle Hire Average Journey Times

January 6, 2011
By

The visualisation above shows the average relative duration of Boris Bikers’ weekday journeys over a 4 month period at hourly intervals. For each time step the average journey time (in seconds) from each docking station has been calculated.This information is interesting because it shows the preference for short journeys around the City of London, whilst ...

Read more »

a survey on ABC

January 6, 2011
By
a survey on ABC

With Jean-Michel Marin, Pierre Pudlo and Robin Ryder, we just completed a survey on the ABC methodology. It is now both arXived and submitted to Statistics and Computing. Rather interestingly, our first draft was written in Jean-Michel’s office in Montpelier by collating the ‘Og posts surveying new ABC papers! (Interestingly because this means that my

Read more »

formatR update (0.1-6)

January 6, 2011
By

A new version of the formatR package is available on CRAN now (binary packages are still on the way). There are three major updates: the inline comments will also be preserved in most cases (in earlier versions, only single lines of comments are preserved) tidy.source() gained a new argument 'text' to accept a character vector

Read more »

web content anlayzer

January 6, 2011
By
web content anlayzer

Just developed a small crawler to check my online content at binfalse.de in terms of W3C validity and the availability of external links. Here is the code and some statistics...

Read more »

Gapminder

January 6, 2011
By
Gapminder

As many people are aware Hans Rosling is an enthusiastic swedish academic with a passion for statistics who recently presented the program The Joy of Stats. One of the great things about Hans Rosling is his presentations and the interactive graphics that he uses to make his points. Fast Tube by Casper The gapminder software

Read more »

New R User Group in Kansas City

January 6, 2011
By

There's a new R User Group based in Kansas City, Kansas. Abraham Mathew just launched the group's website, and is looking for R users in the area to kick things off: This group was started to bring together R users in the Kansas City area to exchange knowledge and provide guidance to new R users. We hope to have...

Read more »

Some market predictions

January 6, 2011
By
Some market predictions

We look at a few forecasts for the year 2011 that we’ve run across, and compare them with the prediction distributions presented in Revised market prediction distributions. FTSE 100 There is a “range forecast” on an Interactive Investor page of 5350 to 6565.  It isn’t clear (to me at least) what this means, but I … Continue reading...

Read more »

sab-R-metrics: Basics of Vectors and Data Calling

January 6, 2011
By

Wednesday, I began a new series called "sab-R-metrics". My hope is that it reduces the frustration that goes along with learning a new programming language and enhances others' ability to perform their own analysis in baseball or other sports. However, these tutorials will hopefully allow you to use these skills in other areas as well. ...

Read more »

sab-R-metrics: Basics of Vectors and Data Calling

January 6, 2011
By

Wednesday, I began a new series called "sab-R-metrics". My hope is that it reduces the frustration that goes along with learning a new programming language and enhances others' ability to perform their own analysis in baseball or other sports. However, these tutorials will hopefully allow you to use these skills in other areas as well. ...

Read more »

Ecological networks from abundance distributions

January 6, 2011
By
Ecological networks from abundance distributions

Another grad student and I tried recently to make a contribution to our understanding of the relationship between ecological network structure (e.g., nestedness) and community structure (e.g., evenness)......Alas, I had no luck making new insights. How...

Read more »

Graph gallery in R

January 6, 2011
By
Graph gallery in R

R is sometime criticized for producing graphs not as elaborated as Matlab ones, or other softwares’. Here is a link to a graph gallery by Romain François to “enhance your data visualization with R”. The corresponding R code is given. Might be useful for ENSAE students for ‘statap’ projects. Below are four examples. The maps

Read more »

formatR update (0.1-6)

January 6, 2011
By

A new version of the formatR package is available on CRAN now (binary packages are still on the way). There are three major updates: the inline comments will also be preserved in most cases (in earlier versions, only single lines of comments are pres...

Read more »