Articles by Raffael Vogler

Illustrated Guide to ROC and AUC

June 23, 2015 | Raffael Vogler

(In a past job interview I failed at explaining how to calculate and interprete ROC curves – so here goes my attempt to fill this knowledge gap.) Think of a regression model mapping a number of features onto a real number … Continue reading → [Read more...]

Germans used to have more Sex in Summer!

January 1, 2015 | Raffael Vogler

Wow – what a headline … okay, I admit it’s phrased quite sensational given that it anticipates just one possible interpretation of increasingly more births around summer / autumn compared to in spring … but I guess I just get … Continue reading → [Read more...]

MongoDB – State of the R

August 31, 2014 | Raffael Vogler

Naturally there are two reasons for why you need to access MongoDB from R: MongoDB is already used for whatever reason and you want to analyze the data stored therein You decide you want store your data in MongoDB instead of … Continue reading → [Read more...]

Impact of Dimensionality on Data in Pictures

April 16, 2014 | Raffael Vogler

I am excited to announce that this is supposed to be my first article published also on r-bloggers.com :) The processing of data needs to take dimensionality into account as usual metrics change their behaviour in subtle ways, which impacts the … Continue reading → The post Impact of Dimensionality on Data ... [Read more...]

The tf-idf-Statistic For Keyword Extraction

February 27, 2014 | Raffael Vogler

The tf-idf-statistic (“term frequency – inverse document frequency”) is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents from the corpus. In terms of tf-idf a word … Continue reading → The post The tf-idf-Statistic For Keyword Extraction appeared first ... [Read more...]

“Digit Recognizer” Challenge on Kaggle using SVM Classification

February 14, 2014 | Raffael Vogler

This article is about the “Digit Recognizer” challenge on Kaggle. You are provided with two data sets. One for training: consisting of 42’000 labeled pixel vectors and one for the final benchmark: consisting of 28’000 vectors while labels are not … Continue reading → The post “Digit Recognizer” Challenge on Kaggle using SVM Classification ... [Read more...]

Pivoting Data in R Excel-style

January 2, 2014 | Raffael Vogler

(This article is referring to an initial proof-of-concept version of r-big-pivot) I have to admit that I very much enjoy pivoting through data using Excel. Its pivoting tool is great for getting a quick insight into a data set’s structure … Continue reading → The post Pivoting Data in R Excel-style ... [Read more...]

An intuitive interpretation of the beta distribution

November 15, 2013 | Raffael Vogler

First of all this text is not just about an intuitive perspective on the beta distribution but at least as much about the idea of looking behind a measured empirical probability and thinking of it as a product of chance itself. … Continue reading → The post An intuitive interpretation of the ... [Read more...]

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)