## Art of Statistical Inference

November 20, 2013
(This article was first published on MATHEMATICS IN MEDICINE, and kindly contributed to R-bloggers) Art of Statistical Inference This post was written by me a few years ago, when I started learning the art and science of data analysis. It will be a good starter for the amateur data analysts. Introduction What is statistics? There are about a dozen...

## Penalizing P Values

November 19, 2013
Penalizing P Values Ioannidis' paper suggesting that most published results in medical research are not true is now high profile enough that even my dad, an artist who wouldn't know a test statistic if it hit him in the face, knows about it. It has even...

November 19, 2013
Hello everybody! Today I found something very cool: There is a R package for mining Facebook. For Twitter there are different, but this is the first one really working well with Facebook. So I wanted to test it and was amazed about how easy it works. Setup: The first thing we need is a Facebook …

## Getting started with R, for Stata users

November 19, 2013
If you learned statistics using Stata software but have an interest in learning the R language, it's worth checking out R~Stata: Notes on Exporing Data by Princeton's Oscar Torres-Reyna. D-Lab's Laura Nelson provides an overview, but in short it's a collection of 30 PDF slides that introduces R for Stata users, and provides translation tables like the one below...

## Calling R from the ERP – A dirty little hack

November 19, 2013
Since I joined SAP around 2 years ago, I simply stopped using ABAP…even when I use it for almost 11 years when I was a consultant…A week ago, I was thinking about writing a new blog…something nice…some hacky…something that would allow me to just rest and don’t blog for the rest of the year…I thought about ERP and R…while...

## Predicting claims with a bayesian network

November 19, 2013
Here is a little Bayesian Network to predict the claims for two different types of drivers over the next year, see also example 16.16 in . Let's assume there are good and bad drivers. The probabilities that a good driver will have 0, 1 or 2 claims in any given year are set to 70%, 20% and 10%,...

## Binomial regression model

November 18, 2013
$Y_i\sim\mathcal{B}(p(\boldsymbol{X_i}))$

Most of the time, when we introduce binomial models, such as the logistic or probit models, we discuss only Bernoulli variables, . This year (actually also the year before), I discuss extensions to multinomial regressions, where  is a function on some simplex. The multinomial logistic model was mention here. The idea is to consider, for instance with three possible classes the following...

## Success rates for EPSRC Fellowships

November 18, 2013
Email I was recently at a presentation where the success rates for EPSRC fellowships were given by theme. The message of the talk was that Engineering fellowships were under-subscribed and so we should all be preparing our applications. But just because a theme is under-subcribed doesn’t mean that you’ve got a better chance of getting

## Some Options for Testing Tables

November 18, 2013
Contingency tables are a very good way to summarize discrete data.  They are quite easy to construct and reasonably easy to understand. However, there are many nuances with tables and care should be taken when making conclusions related to the data. Here are just a few thoughts on the topic. Dealing with sparse data On

## Evaluating Quandl Data Quality

November 15, 2013
Quandl has indexed millions of time-series datasets from over 400 sources. All of Quandl’s datasets are open and free. This is great news but before performing any backtest using Quandl data, I want to compare it with a trusted source: Bloomberg for the purpose of this post. I will focus only on daily Futures data here