Illustrated Guide to ROC and AUC

June 23, 2015 | Raffael Vogler

(In a past job interview I failed at explaining how to calculate and interprete ROC curves – so here goes my attempt to fill this knowledge gap.) Think of a regression model mapping a number of features onto a real number

Germans used to have more Sex in Summer!

January 1, 2015 | Raffael Vogler

Wow – what a headline … okay, I admit it's phrased quite sensational given that it anticipates just one possible interpretation of increasingly more births around summer / autumn compared to in spring … but I guess I just get

MongoDB – State of the R

August 31, 2014 | Raffael Vogler

Naturally there are two reasons for why you need to access MongoDB from R: MongoDB is already used for whatever reason and you want to analyze the data stored therein You decide you want store your data in MongoDB instead of

Impact of Dimensionality on Data in Pictures

April 16, 2014 | Raffael Vogler

I am excited to announce that this is supposed to be my first article published also on :) The processing of data needs to take dimensionality into account as usual metrics change their behaviour in subtle ways, which impacts the

The tf-idf-Statistic For Keyword Extraction

February 27, 2014 | Raffael Vogler

The tf-idf-statistic ("term frequency – inverse document frequency") is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents from the corpus. In terms of tf-idf a word

“Digit Recognizer” Challenge on Kaggle using SVM Classification

February 14, 2014 | Raffael Vogler

This article is about the "Digit Recognizer" challenge on Kaggle. You are provided with two data sets. One for training: consisting of 42'000 labeled pixel vectors and one for the final benchmark: consisting of 28'000 vectors while labels are not

Pivoting Data in R Excel-style

January 2, 2014 | Raffael Vogler

(This article is referring to an initial proof-of-concept version of r-big-pivot) I have to admit that I very much enjoy pivoting through data using Excel. Its pivoting tool is great for getting a quick insight into a data set's structure

An intuitive interpretation of the beta distribution

November 15, 2013 | Raffael Vogler

First of all this text is not just about an intuitive perspective on the beta distribution but at least as much about the idea of looking behind a measured empirical probability and thinking of it as a product of chance itself.

