Stratified sampling

June 9, 2010
The recently arXived paper of Goldstein, Rinott and Scarsini studies the impact of refining a partition on the precision of a stratified maximising/integration Monte Carlo approach. Quite naturally, if the partition gets improved, simulating points in each set of the partition can only improve the quality of the approximation, whether the problem is in maximising

biomaRt and GenomeGraphs: a worked example

June 6, 2010
As promised a few posts ago, another demonstration of the excellent biomaRt package, this time in conjunction with GenomeGraphs. Here’s what we’re going to do: Grab some public microarray data Normalise and get a list of the most differentially-expressed probesets Use biomaRt to fetch the genes associated with those probesets Plot the data using GenomeGraphs

On particle learning

June 4, 2010
In connection with the Valencia 9 meeting that started yesterday, and with Hedie‘s talk there, we have posted on arXiv a set of comments on particle learning. The arXiv paper contains several discussions but they mostly focus on the inevitable degeneracy that accompanies particle systems. When Lopes et al. state that is not of interest

Making Data Work online conference

June 3, 2010
O'Reilly is hosting a conference on June 9 on the topic of the analysis of large data sets. The title of the conference is Making Data Work: Ever since Hal Varian proclaimed that data analysis is the sexy career for the coming decade, people have been talking about data. And big data. And even bigger data. This online conference,...

MLB Baseball Pitching Matchups ~ grabbing pitcher and/or batter codes by specify game date using R XML

June 1, 2010
MLB Gameday stores its game data in XML format, with the players denoted in ID numbers. To find out who is who, the codes are stored in pitchers.xml or batters.xml of each game. My DownloadPitchFX.R script can download the ID numbers, but it doesn’t look to see who the ID is because of the extra

Vanilla Rao-Blackwellisation [re]revised

May 31, 2010
Although the revision is quite minor, it took us two months to complete from the time I received the news in the Atlanta airport lounge… The vanilla Rao-Blackwellisation paper with Randal Douc has thus been resubmitted to the Annals of Statistics. And rearXived. The only significant change is the inclusion of two tables detailing computing

MLB Baseball Pitching Matchups ~ manipulating pitch f/x data using the RMySQL package in R

May 31, 2010
After downloading some pitch f/x data using my R script, we can finally have some fun. But because the pitch f/x data is very elaborate, R can easily get overwhelmed by copying the dataset back and forth in memory, as you manipulate the data. So the natural progression is to use relational database systems. Here,

May 28, 2010
May 26, 2010
Ever wondered which Twitterers you and a friend share? Using R and the twitteR package, there's an easy way to find out. Cornelius Puschmann hacked together some R code to do just that for the Humanities and Technology Camp and it seems to work pretty well. Just replace 'coffee001' with the your Twitter username, 'mypassword' with your Twitter password,...