Articles by Raffael Vogler

Illustrated Guide to ROC and AUC

June 23, 2015 | Raffael Vogler

(In a past job interview I failed at explaining how to calculate and interprete ROC curves – so here goes my attempt to fill this knowledge gap.) Think of a regression model mapping a number of features onto a real number … Continue reading → [Read more...]

Introduction to OpenCPU for R on EC2 with Python

February 15, 2015 | Raffael Vogler

OpenCPU is (simply put) a server implementing a RESTful web API for remotely executing R functions and retrieving results. In this tutorial I am going to showcase how OpenCPU can be installed on an EC2 instance running Ubuntu 14.04. Python and … Continue reading → [Read more...]

As a Data Scientist it is my Obligation to support #nobagida, #nopegida and any other #no[a-z]{2}gida today :)

January 19, 2015 | Raffael Vogler

[crayon-54bd40498dc54577668663/] [Read more...]

Germans used to have more Sex in Summer!

January 1, 2015 | Raffael Vogler

Wow – what a headline … okay, I admit it’s phrased quite sensational given that it anticipates just one possible interpretation of increasingly more births around summer / autumn compared to in spring … but I guess I just get … Continue reading → [Read more...]

Hierarchical Clustering with R (feat. D3.js and Shiny)

December 14, 2014 | Raffael Vogler

Agglomerative hierarchical clustering is a simple, intuitive and well-understood method for clustering data points. I used it with good results in a project to estimate the true geographical position of objects based on measured estimates. With this tutorial I would … Continue reading → [Read more...]

Twitter’s REST API v1.1 with R (for Linux and Windows)

September 22, 2014 | Raffael Vogler

In this tutorial I am going to describe a straightforward way of how to make use of Twitter’s REST API v1.1. For that purpose I composed a little package (RTwitterAPI), so that requesting data just needs the API URL, the API parameters … Continue reading → [Read more...]

MongoDB – State of the R

August 31, 2014 | Raffael Vogler

Naturally there are two reasons for why you need to access MongoDB from R: MongoDB is already used for whatever reason and you want to analyze the data stored therein You decide you want store your data in MongoDB instead of … Continue reading → [Read more...]

Reasonable Inheritance of Cluster Identities in Repetitive Clustering

August 15, 2014 | Raffael Vogler

… or Inferring Identity from Observations Let’s assume the following application: A conservation organisation starts a project to geographically catalogue the remaining representatives of an endangered plant species. For that purpose hikers are encouraged to communicate the location of the plant … Continue reading → [Read more...]

Talking to Twitter’s REST API v1.1 with R

June 10, 2014 | Raffael Vogler

In this text I am going to describe a very straightforward way of how to make use of Twitter’s REST API v1.1. I put some code together for that purpose, so that requesting data just needs the API URL, the API … Continue reading → [Read more...]

FIR Filter Design and Digital Signal Processing in R

May 15, 2014 | Raffael Vogler

This article serves the purpose of illustrating that signal processing with R is possible – thanks to the signal package – and to keep a reference of some of the stuff that I learned at my last edX course. Anyway, I … Continue reading → [Read more...]

Relation of Word Order and Compression Ratio and Degree of Structure

May 7, 2014 | Raffael Vogler

Having a habit of compulsively wondering approximately every 34.765th day about how zip compression (bzip2 in this case) might be used to measure information contained in data – this time the question popped up in my head of whether or … Continue reading → [Read more...]

MapReduce with R on Hadoop and Amazon EMR

April 25, 2014 | Raffael Vogler

You all know why MapReduce is fancy – so let’s just jump right in. I like researching data and I like to see results fast – does that mean I enjoy the process of setting up a Hadoop cluster? No, … Continue reading → [Read more...]

Testing for Linear Separability with Linear Programming in R

April 19, 2014 | Raffael Vogler

For the previous article I needed a quick way to figure out if two sets of points are linearly separable. But for crying out loud I could not find a simple and efficient implementation for this task. Except for the perceptron and … Continue reading → The post Testing for Linear Separability ... [Read more...]

Impact of Dimensionality on Data in Pictures

April 16, 2014 | Raffael Vogler

I am excited to announce that this is supposed to be my first article published also on r-bloggers.com :) The processing of data needs to take dimensionality into account as usual metrics change their behaviour in subtle ways, which impacts the … Continue reading → The post Impact of Dimensionality on Data ... [Read more...]

Titanic challenge on Kaggle with decision trees (party) and SVMs (kernlab)

March 28, 2014 | Raffael Vogler

The Titanic challenge on Kaggle is about inferring from a number of personal details whether a passenger survived the disaster or did not. I gave two algorithms a try, which are decision trees using R package party and SVMs using … Continue reading → The post Titanic challenge on Kaggle with decision ... [Read more...]

The tf-idf-Statistic For Keyword Extraction

February 27, 2014 | Raffael Vogler

The tf-idf-statistic (“term frequency – inverse document frequency”) is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents from the corpus. In terms of tf-idf a word … Continue reading → The post The tf-idf-Statistic For Keyword Extraction appeared first ... [Read more...]

“Digit Recognizer” Challenge on Kaggle using SVM Classification

February 14, 2014 | Raffael Vogler

This article is about the “Digit Recognizer” challenge on Kaggle. You are provided with two data sets. One for training: consisting of 42’000 labeled pixel vectors and one for the final benchmark: consisting of 28’000 vectors while labels are not … Continue reading → The post “Digit Recognizer” Challenge on Kaggle using SVM Classification ... [Read more...]

Pivoting Data in R Excel-style

January 2, 2014 | Raffael Vogler

(This article is referring to an initial proof-of-concept version of r-big-pivot) I have to admit that I very much enjoy pivoting through data using Excel. Its pivoting tool is great for getting a quick insight into a data set’s structure … Continue reading → The post Pivoting Data in R Excel-style ... [Read more...]

An intuitive interpretation of the beta distribution

November 15, 2013 | Raffael Vogler

First of all this text is not just about an intuitive perspective on the beta distribution but at least as much about the idea of looking behind a measured empirical probability and thinking of it as a product of chance itself. … Continue reading → The post An intuitive interpretation of the ... [Read more...]

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by Raffael Vogler

Illustrated Guide to ROC and AUC

Introduction to OpenCPU for R on EC2 with Python

As a Data Scientist it is my Obligation to support #nobagida, #nopegida and any other #no[a-z]{2}gida today :)

Germans used to have more Sex in Summer!

Hierarchical Clustering with R (feat. D3.js and Shiny)

Twitter’s REST API v1.1 with R (for Linux and Windows)

MongoDB – State of the R

Reasonable Inheritance of Cluster Identities in Repetitive Clustering

Talking to Twitter’s REST API v1.1 with R

FIR Filter Design and Digital Signal Processing in R

Relation of Word Order and Compression Ratio and Degree of Structure

MapReduce with R on Hadoop and Amazon EMR

Testing for Linear Separability with Linear Programming in R

Impact of Dimensionality on Data in Pictures

Titanic challenge on Kaggle with decision trees (party) and SVMs (kernlab)

The tf-idf-Statistic For Keyword Extraction

“Digit Recognizer” Challenge on Kaggle using SVM Classification

Pivoting Data in R Excel-style

An intuitive interpretation of the beta distribution

Articles by Raffael Vogler

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)