## Bayes factors and martingales

August 10, 2011
By
$Bayes factors and martingales$

A surprising paper came out in the last issue of Statistical Science, linking martingales and Bayes factors. In the historical part, the authors (Shafer, Shen, Vereshchagin and Vovk) recall that martingales were popularised by Martin-Löf, who is also influential in the theory of algorithmic randomness. A property of test martingales (i.e., martingales that are non

## SNA: Visualising an email box with R

August 10, 2011
By

Are statistics sexy? Visualising social networks certainly is! I wrote a little function, which makes producing beautiful plots depicting a mailbox with R an extremely easy task. I find visualisations of ‘social graphs’ particularly appealing. They look like flowers. I … Continue reading →

## Dump MySQL to CSV using R

August 10, 2011
By

Based on a related post on one of my favorite python-lists I remembered, that I wrote a similar snipplet some time ago. So if you want to dump your whole MySQL database to csv-files you can recycle the following code: ?Download mysql2cvs.R1 2 3 4 5 6 7 8 9 require(RMySQL) m<-MySQL() summary(m) con<-dbConnect(m, dbname

## Using the google prediction API from R

August 10, 2011
By

Google has a "black box" prediction API that they provide for use with creating recommender systems or filtering spam. Furthermore, they provide an R package for interfacing that API, but try as I might I cannot get it to work under windows. Here are ...

## Plotting molecular properties for (sub)sets

August 10, 2011
By

For a toxicology paper we are writing up, I need to create a few plots showing how the toxic and non-toxic molecules differ (or not) with respect to a few molecular properties, such as logP or the molecular weight. The rcdk package provides all, of cou...

## A 60-second survey for R users

August 10, 2011
By

I'm doing a little research to validate estimates of the size of the R user community. If you're an R user, please take a minute to complete this three-question survey on R usage at your organization. Thanks in advance. Revolution Analytics: R user base survey

## Informational Easing: A Change In F.O.M.C. Expectations

August 10, 2011
By

Let's analyze the latest FOMC policy move.The FOMC met yesterday and changed up the communications strategy.  How so? Well, until yesterday the statement has been saying as of June 22, 2011:"The Committee continues to anticipat...

## Scraping web data in R

August 10, 2011
By

In my last post, I went through a lot of effort to scrape the PMI index off the ISM website.  It turns out that was unnecessary effort, as commentator "senne" pointed out that this index is available from FRED, with the symbol NAPM. &nbs...

## Using a “pure infographic” to explore differences between information visualization and statistical graphics

August 10, 2011
By

Our discussion on data visualization continues. One one side are three statisticians–Antony Unwin, Kaiser Fung, and myself. We have been writing about the different goals served by information visualization and statistical graphics. On the other side are graphics experts (sorry for the imprecision, I don’t know exactly what these people do in their day jobs The post Using...

## Multiple cores in R, revisited

August 10, 2011
By

The bigmemory package in combination with doMC provides at least a partial solution for sharing a large data set across multiple cores in R. With this solution you can work on the same matrix using several threads. It is also a very scalable solution. ...

## Coding, GUIs and Statistical Rituals

August 10, 2011
By

I was recently inspired to comment on this blog post, asking is R is a cure for ‘mindless statistics’. Anyone whose familiar with statistics used in applied fields like epidemiology, sociology, social sciences generally will be familiar with the idea of a ‘statistical ritual’. Rather than think about the proper statistical approach to every question,

## What do you want to see at useR 2012?

August 9, 2011
By

This year's useR! conference at Warwick University is less than a week away, but planning is already underway for useR! 2012, to be held at Vanderbilt University in Nashville. If you're planning to attend, conference organizer Frank Harrell is looking for your input: The 2012 R User Conference - useR! 2012 - will be held in Nashville Tennessee USA,...

## Amazon Machine Image Created With RTextTools Pre-installed

We recently created an AMI for Amazon's EC2 cloud computing service. Users with AWS accounts can access the public AMI by searching ami-817eb8e8. The AMI is based off of Drew Conway's excellent AMI, but with R 2.13 loaded and RTextTools and

## What makes a hockey Hall-of-Famer?

August 9, 2011
By

At the JSM conference last week, I stopped by a great poster by Steve Salaga and Brian Mills, graduate students at University of Michigan's Department of Sport Management. The guys were clearly hockey fans, and had channelled their enthusiasm for a sport into an interesting statistical analysis of game and player data from the NHL. One analysis, based on...

## Estimate decay of linkage disequilibrium with distance

August 9, 2011
By

It is well known that linkage disequilibrium (LD) decays with distance. Several functions have been proposed to estimate such decay. Among the most widely used are the Hill and Weir (1) formula for describing the decay of r2 and a formula proposed by Abecasis (2) for describing the decay of D’. I wrote R functions

## Forecasting recessions

August 9, 2011
By

John Hussman has a Recession Warning Composite that I am attempting to replicate/improve. The underlying data seems to be easy enough to get from FRED using the quantmod package in R. I don't quite understand the index Hussman is using for commercial...

## The indices understate the carnage

August 9, 2011
By

The first 6 trading days of August have been bad for the major indices, but how variable is that across portfolios? To answer that, two sets of random portfolios were generated from the constituents of the S&P 500.  The trading days are 2011 August 1 — 5 and 8. The returns of the indices for … Continue reading...

## Blog planets are like conferences… (aka R-bloggers.com)

August 8, 2011
By

Blog planets are websites that aggregate blog feeds around a particular topic or project. It is probably called after one of its first implementations, the Planet software. These planets are like conferences, rather than journals. Like conferences with...

## Installing Rmpi with OpenMPI on Mac OS X Lion

August 8, 2011
By

For whatever reason, Apple decided not to include OpenMPI in Mac OS X Lion (it was supported in Leopard and Snow Leopard). I found this out the hard way after doing a clean install of Lion. Here are steps to install OpenMPI and get it working with the Rmpi package in R. One benefit of

## How ANZ uses R for credit risk analysis

August 8, 2011
By

At last month's R user group meeting in Melbourne, the theme was "Experiences with using SAS and R in insurance and banking". There, Hong Ooi from ANZ (Australia and New Zealand Banking Group) gave a presentation on "Experiences with using R in credit risk". I didn't get to see the presentation myself, but the slides tell a great story...

August 8, 2011
By

My thirst for statistics has been increasing. IV had another requirement, which would eventually be useful to me as well. He currently downloads FII and DII buy and sell values and its impact on Nifty manually in Excel. He suggested me to try and autom...

## Power of running world records

August 8, 2011
By

Followinga few entries on sports here and there, I was wondering what kind of law follow the running records with respect to the distance. The data are available on Wikipedia, or here for a tidied version. It collects 18 distances, from 100 meters to 100 kilometers. A log-log scale is in order: It is nice

## Slides from Rocky Mtn SABR Meeting

August 8, 2011
By

Last Saturday I had the good fortune to present a talk on finding, gathering, and analyzing some sports-related data on the web at the local SABR group meeting.  In case you’re not familiar with the “SABR” acronym, it stands for … Continue reading →

## Two-Way PERMANOVA (with Vegan-Function adonis) Using Customized Contrasts

August 8, 2011
By

...say you have a multivariate dataset and a two-way factorial design - you do a PERMANOVA and the aov-table (adonis is using ANOVA or "sum"-contrasts) tells you there is an interaction - how to proceed when you want to go deeper into the ana...

## The Open Governing Index: How open is the R project?

August 8, 2011
By

The Open Governing Index is a new measure developed by VisionMobile, that rates open-source projects regarding their governance process. The index has four facets, described thoroughly in the "Open Governance Index" publication, and briefly below. access - These criteria assess the availability of source code, a permissive license, developer support mechanisms, a roadmap, and openness

## Win-Vector starts submitting content to r-bloggers.com

August 8, 2011
By

We have been consistently impressed by and enjoyed the wealth of R wisdom available on the R-bloggers aggregation site. Therefore Win-Vector LLC is granting the right to reformat and redistribute (with attribution and link) our blog‘s R content in the R-bloggers site and feeds. We hope to see our R content shared through this network. Related posts:

## (#ESA11) rOpenSci: a collaborative effort to develop R-based tools for facilitating Open Science

August 8, 2011
By

Our development team would like to announce the launch of rOpenSci. As the title states, this project aims to create R packages to make open science more available to researchers. http://ropensci.org/ What this means is t...

## Trading volume forecast for an illiquid stock

August 8, 2011
By

When dealing with transaction cost analysis, a stock’s volume is assumed to be stable or foreseeable.  However, there is different picture, then we are dealing with an illiquid stock. It is relatively easy to forecast the volume of a liquid stock, because trading volume has high autocorrelation – the volumes at t and t+1 are correlated. For