Monthly Archives: June 2014

Making publicly available data publicly accessible: Belgium’s Hospital Minimal Data

June 13, 2014
By
Making publicly available data publicly accessible: Belgium’s Hospital Minimal Data

In current times data is everywhere. The big challenge however is making it not only available, but also accessible to those who need it. An example of this is the Hospital Minimal Data (Belgium). This dataset is openly published by the Federal Public Service for Health, Food Chain Safety & Environment. From a perspective of The post

Read more »

trying to speed up Metropolis… and failing!

June 12, 2014
By
trying to speed up Metropolis… and failing!

A while ago (but still after Iceland since I used the thorn rune as a math symbol!), I wrote the following post draft as a memo. Now that Marco Banterle, Clara Grazian and myself have completed our delayed acceptance paper, it may be of interest to some readers to see how a first attempt proved

Read more »

Five Hard-Won Lessons Using Hive

June 12, 2014
By

I’ve been spending a ton of time lately on the data engineering side of ‘data science’, so I’ve been writing a lot of Hive queries. Hive is a great tool for querying large amounts of data, without having to know very much about the underpinnings of Hadoop. Unfortunately, there are a lot of things about Five Hard-Won...

Read more »

R User Groups June 2014

June 12, 2014
By
R User Groups June 2014

by Joseph Rickert useR! 2014 is just about two weeks away, and I am very much looking forward to meeting R users from around the world. This is just a great time to catch up with old friends, hopefully make some new friends, and talk about R and R user groups. The number of R user groups continues to...

Read more »

Prediction model for the FIFA World Cup 2014

June 12, 2014
By
Prediction model for the FIFA World Cup 2014

Like a last minute goal, so to speak, Andreas Groll and Gunther Schauberger of Ludwig-Maximilians-University Munich announced their predictions for the FIFA World Cup 2014 in Brazil - just hours before the opening game. Andreas Groll, with his successful prediction of the European Championship 2012 already experienced in this field, and Gunther Schauberger did set out to predict the 2014 world cup champion based on statistical...

Read more »

The Gilbreath’s Conjecture

June 12, 2014
By
The Gilbreath’s Conjecture

317 is a prime, not because we think so, or because our minds are shaped in one way rather than another, but because it is so, because mathematical reality is built that way (G.H. Hardy) In 1958, the mathematician and magician Norman L. Gilbreath presented a disconcerting hypothesis conceived in the back of a napkin. Gilbreath

Read more »

Example 2014.6: Comparing medians and the Wilcoxon rank-sum test

June 12, 2014
By
Example 2014.6: Comparing medians and the Wilcoxon rank-sum test

A colleague recently contacted us with the following question: "My outcome is skewed-- how can I compare medians across multiple categories?" What they were asking for was a generalization of the Wilcoxon rank-sum test (also known as the Mann-Whitney-Wilcoxon test, among other monikers) to more than two groups. For the record, the answer...

Read more »

Twins, Tripods and Phantoms at the Comrades Marathon

June 12, 2014
By
Twins, Tripods and Phantoms at the Comrades Marathon

Having picked up a viral infection days before this year’s Comrades Marathon, on 1 June I was left with time on my hands and somewhat desperate for any distraction. So I spent some time looking at my archive of Comrades data and considering some new questions. For example, what are the chances of two runners

Read more »

Basketball Data Part III – BMI: Does it Matter?

June 11, 2014
By
Basketball Data Part III – BMI: Does it Matter?

For those of you who are just joining us, please refer back to the previous two posts referencing scraping XML data and length of NBA career by position. The next idea I wanted to explore was whether BMI had any … Continue reading →

Read more »

More companies using R: Uber and CultureAmp

June 11, 2014
By
More companies using R: Uber and CultureAmp

To follow on from this post, here are a couple more companies using R: Uber, the ride-share company, uses R for statistical analysis. According to econometric analysis of DUI incidents (driving under the influence of alcohol or drugs) in Seattle, the introduction of the Uber service corresponded with a 10% decrease in DUIs (about 5 a week). The analysis...

Read more »