Blog Archives

Testing for Linear Separability with Linear Programming in R

April 19, 2014
By
Testing for Linear Separability with Linear Programming in R

For the previous article I needed a quick way to figure out if two sets of points are linearly separable. But for crying out loud I could not find a simple and efficient implementation for this task. Except for the perceptron and … Continue reading → The post Testing for Linear Separability with Linear Programming in R appeared first...

Read more »

Impact of Dimensionality on Data in Pictures

April 16, 2014
By
Impact of Dimensionality on Data in Pictures

I am excited to announce that this is supposed to be my first article published also on r-bloggers.com :) The processing of data needs to take dimensionality into account as usual metrics change their behaviour in subtle ways, which impacts the … Continue reading → The post Impact of Dimensionality on Data in Pictures appeared first on

Read more »

Titanic challenge on Kaggle with decision trees (party) and SVMs (kernlab)

March 28, 2014
By
Titanic challenge on Kaggle with decision trees (party) and SVMs (kernlab)

The Titanic challenge on Kaggle is about inferring from a number of personal details whether a passenger survived the disaster or did not. I gave two algorithms a try, which are decision trees using R package party and SVMs using … Continue reading → The post Titanic challenge on Kaggle with decision trees (party) and SVMs (kernlab)...

Read more »

The tf-idf-Statistic For Keyword Extraction

February 27, 2014
By
The tf-idf-Statistic For Keyword Extraction

The tf-idf-statistic (“term frequency – inverse document frequency”) is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents from the corpus. In terms of tf-idf a word … Continue reading → The post The tf-idf-Statistic For Keyword Extraction appeared first on joy...

Read more »

“Digit Recognizer” Challenge on Kaggle using SVM Classification

February 14, 2014
By
“Digit Recognizer” Challenge on Kaggle using SVM Classification

This article is about the “Digit Recognizer” challenge on Kaggle. You are provided with two data sets. One for training: consisting of 42’000 labeled pixel vectors and one for the final benchmark: consisting of 28’000 vectors while labels are not … Continue reading → The post “Digit Recognizer” Challenge on Kaggle using SVM Classification appeared first on...

Read more »

Pivoting Data in R Excel-style

January 2, 2014
By
Pivoting Data in R Excel-style

(This article is referring to an initial proof-of-concept version of r-big-pivot) I have to admit that I very much enjoy pivoting through data using Excel. Its pivoting tool is great for getting a quick insight into a data set’s structure … Continue reading → The post Pivoting Data in R Excel-style appeared first on joy...

Read more »

An intuitive interpretation of the beta distribution

November 15, 2013
By
An intuitive interpretation of the beta distribution

First of all this text is not just about an intuitive perspective on the beta distribution but at least as much about the idea of looking behind a measured empirical probability and thinking of it as a product of chance itself. … Continue reading → The post An intuitive interpretation of the beta distribution appeared first on

Read more »