Articles by andrew

Clustering Lightning Discharges to Identify Storms

September 13, 2013 | andrew

A short talk that I gave at the LIGHTS 2013 Conference (Johannesburg, 12 September 2013). The slides are relatively devoid of text because I like the audience to hear the content rather than read it. The central message of the presentation is that clustering lightning discharges into storms is not a trivial task, ... [Read more...]

Clustering the Words of William Shakespeare

September 10, 2013 | andrew

In my previous post I used the tm package to do some simple text mining on the Complete Works of William Shakespeare. Today I am taking some of those results and using them to generate word clusters. Preparing the Data I will start with the Term Document Matrix (TDM) consisting ... [Read more...]

Text Mining the Complete Works of William Shakespeare

September 5, 2013 | andrew

I am starting a new project that will require some serious text mining. So, in the interests of bringing myself up to speed on the tm package, I thought I would apply it to the Complete Works of William Shakespeare and just see what falls out. The first order of ... [Read more...]

Presenting Conformance Statistics

August 27, 2013 | andrew

A client came to me with some conformance data. She was having a hard time making sense of it in a spreadsheet. I had a look at a couple of ways of presenting it that would bring out the important points. The Data The data came as a spreadsheet with ... [Read more...]

The Wonders of foreach

August 25, 2013 | andrew

Writing code from scratch to do parallel computations can be rather tricky. However, the packages providing parallel facilities in R make it remarkably easy. One such package is foreach. I am going to document my trail of discovery with foreach, which began some time ago, but has really come into ... [Read more...]

Fitting a Model by Maximum Likelihood

August 18, 2013 | andrew

Maximum-Likelihood Estimation (MLE) is a statistical technique for estimating model parameters. It basically sets out to answer the question: what model parameters are most likely to characterise a given set of data? First you need to select a model for the data. And the model must have one or more (... [Read more...]

Finding Correlations in Data with Uncertainty: Classical Solution

August 13, 2013 | andrew

Following up on my previous post as a result of an excellent suggestion from Andrej Spiess. The data are indeed very heteroscedastic! Andrej suggested that an alternative way to attack this problem would be to use weighted correlation with weights being the inverse of the measurement variance. Let’s look ... [Read more...]

Finding Correlations in Data with Uncertainty

August 11, 2013 | andrew

A week or so ago a colleague of mine asked if I knew how to calculate correlations for data with uncertainties. Now, if we are going to be honest, then all data should have some level of experimental or measurement error. However, I suspect that in the majority of cases ... [Read more...]

Uncertainty in parameter estimates using multilevel models

August 3, 2013 | andrew

David Hsu writes: I have a (perhaps) simple question about uncertainty in parameter estimates using multilevel models — what is an appropriate threshold for measure parameter uncertainty in a multilevel model? The reason why I ask is that I set out to do a crossed two-way model with two varying intercepts, ... [Read more...]

A Chart of Recent Comrades Marathon Winners

July 30, 2013 | andrew

Continuing on my quest to document the Comrades Marathon results, today I have put together a chart showing the winners of both the men and ladies races since 1980. Click on the image below to see a larger version. The analysis started off with the same data set that I was ... [Read more...]

Comrades Marathon Inference Trees

July 19, 2013 | andrew

Following up on my previous posts regarding the results of the Comrades Marathon, I was planning on putting together a set of models which would predict likelihood to finish and probable finishing time. Along the way I got distracted by something else that is just as interesting and which produces ... [Read more...]

Optimising a Noisy Objective Function

July 16, 2013 | andrew

I am busy with a project where I need to calibrate the Heston Model to some Asian options data. The model has been implemented as a function which executes a Monte Carlo (MC) simulation. As a result, the objective function is rather noisy. There are a number of algorithms for ... [Read more...]

Priors

July 16, 2013 | andrew

Nick Firoozye writes: While I am absolutely sympathetic to the Bayesian agenda I am often troubled by the requirement of having priors. We must have priors on the parameter of an infinite number of model we have never seen before and I find this troubling. There is a similarly troubling ... [Read more...]

Please send all comments to /dev/ripley

July 10, 2013 | andrew

Trey Causey asks, Has R-help gotten meaner over time?: I began by using Scrapy to download all the e-mails sent to R-help between April 1997 (the earliest available archive) and December 2012. . . . We each read 500 messages and coded them in the following categories: -2 Negative and unhelpful -1 Negative but helpful [...] The ... [Read more...]

Are Green Number Runners More Likely to Bail?

June 22, 2013 | andrew

Comrades Marathon runners are awarded a permanent green race number once they have completed 10 journeys between Durban and Pietermaritzburg. For many runners, once they have completed the race a few times, achieving a green number becomes a possibility. And once the idea takes hold, it can become something of a ... [Read more...]

Job openings at conservative political analytics firm!

June 21, 2013 | andrew

After posting that announcement about Civis Analytics, I wrote, “If a reconstituted Romney Analytics team is hiring, let me know and I’ll post that ad too.” Adam Schaeffer obliged: Not sure about Romney’s team, but Evolving Strategies is looking for sharp folks who lean right: Evolving Strategies is ...

[Read more...]

The Green Number Effect

June 18, 2013 | andrew

Following up on a suggestion from my previous post, here are the statistics for medal count versus age. Every point on the plot is the number (see colour legend on right) of athletes who have achieved a given number of medals by a particular age. There is clear evidence of ... [Read more...]

Job opening! Come work with us!

June 18, 2013 | andrew

Postdoctoral position in statistical modeling of social networks A full-time postdoctoral position is available beginning Fall 2014 in the research group of Tian Zheng and Andrew Gelman working on statistical analysis and modeling of social network data, in close cooperation with our experimental collaborators. Four key papers of this project so ... [Read more...]

Medal Allocations at the Comrades Marathon

June 9, 2013 | andrew

Following up on my previous post regarding attrition rates at Comrades Marathon 2013, here are the statistics I have gathered for medal allocations. There is some interesting history behind the Comrades Marathon medals. For reference, the medals are allocated as follows: Gold medals to the first ten finishers in the men’... [Read more...]

Robust logistic regression

June 7, 2013 | andrew

Corey Yanofsky writes: In your work, you’ve robustificated logistic regression by having the logit function saturate at, e.g., 0.01 and 0.99, instead of 0 and 1. Do you have any thoughts on a sensible setting for the saturation values? My intuition suggests that it has something to do with proportion of outliers ... [Read more...]

« 1 2 3 4 5 6 … 8 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by andrew

Clustering Lightning Discharges to Identify Storms

Clustering the Words of William Shakespeare

Text Mining the Complete Works of William Shakespeare

Presenting Conformance Statistics

The Wonders of foreach

Fitting a Model by Maximum Likelihood

Finding Correlations in Data with Uncertainty: Classical Solution

Finding Correlations in Data with Uncertainty

Uncertainty in parameter estimates using multilevel models

A Chart of Recent Comrades Marathon Winners

Comrades Marathon Inference Trees

Optimising a Noisy Objective Function

Priors

Please send all comments to /dev/ripley

Are Green Number Runners More Likely to Bail?

Job openings at conservative political analytics firm!

The Green Number Effect

Job opening! Come work with us!

Medal Allocations at the Comrades Marathon

Robust logistic regression

Articles by andrew

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)