Blog Archives

Text Mining the Complete Works of William Shakespeare

September 5, 2013
By
Text Mining the Complete Works of William Shakespeare

I am starting a new project that will require some serious text mining. So, in the interests of bringing myself up to speed on the tm package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out. The first order of business was getting my hands

Read more »

Presenting Conformance Statistics

August 27, 2013
By
Presenting Conformance Statistics

A client came to me with some conformance data. She was having a hard time making sense of it in a spreadsheet. I had a look at a couple of ways of presenting it that would bring out the important points. The Data The data came as a spreadsheet with multiple sheets. Each of the

Read more »

The Wonders of foreach

August 25, 2013
By
The Wonders of foreach

Writing code from scratch to do parallel computations can be rather tricky. However, the packages providing parallel facilities in R make it remarkably easy. One such package is foreach. I am going to document my trail of discovery with foreach, which began some time ago, but has really come into fruition over the last few

Read more »

Fitting a Model by Maximum Likelihood

August 18, 2013
By
Fitting a Model by Maximum Likelihood

Maximum-Likelihood Estimation (MLE) is a statistical technique for estimating model parameters. It basically sets out to answer the question: what model parameters are most likely to characterise a given set of data? First you need to select a model for the data. And the model must have one or more (unknown) parameters. As the name

Read more »

Finding Correlations in Data with Uncertainty: Classical Solution

August 13, 2013
By

Following up on my previous post as a result of an excellent suggestion from Andrej Spiess. The data are indeed very heteroscedastic! Andrej suggested that an alternative way to attack this problem would be to use weighted correlation with weights being the inverse of the measurement variance. Let’s look at the synthetic data first. This is

Read more »

Finding Correlations in Data with Uncertainty

August 11, 2013
By
Finding Correlations in Data with Uncertainty

A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases these uncertainties are ignored when

Read more »

Uncertainty in parameter estimates using multilevel models

August 3, 2013
By

David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, similar to your flight The post Uncertainty...

Read more »

A Chart of Recent Comrades Marathon Winners

July 30, 2013
By
A Chart of Recent Comrades Marathon Winners

Continuing on my quest to document the Comrades Marathon results, today I have put together a chart showing the winners of both the men and ladies races since 1980. Click on the image below to see a larger version. The analysis started off with the same data set that I was working with before, from

Read more »

Comrades Marathon Inference Trees

July 19, 2013
By
Comrades Marathon Inference Trees

Following up on my previous posts regarding the results of the Comrades Marathon, I was planning on putting together a set of models which would predict likelihood to finish and probable finishing time. Along the way I got distracted by something else that is just as interesting and which produces results which readily yield to qualitative

Read more »

Optimising a Noisy Objective Function

July 16, 2013
By
Optimising a Noisy Objective Function

I am busy with a project where I need to calibrate the Heston Model to some Asian options data. The model has been implemented as a function which executes a Monte Carlo (MC) simulation. As a result, the objective function is rather noisy. There are a number of algorithms for dealing with this sort of problem, and

Read more »