Blog Archives

The tf-idf-Statistic For Keyword Extraction

February 27, 2014
By
The tf-idf-Statistic For Keyword Extraction

The tf-idf-statistic (“term frequency – inverse document frequency”) is a common tool for the purpose of extracting keywords from a document by not just considering a single document but all documents from the corpus. In terms of tf-idf a word … Continue reading → The post The tf-idf-Statistic For Keyword Extraction appeared first on joy...

Read more »

“Digit Recognizer” Challenge on Kaggle using SVM Classification

February 14, 2014
By
“Digit Recognizer” Challenge on Kaggle using SVM Classification

This article is about the “Digit Recognizer” challenge on Kaggle. You are provided with two data sets. One for training: consisting of 42’000 labeled pixel vectors and one for the final benchmark: consisting of 28’000 vectors while labels are not … Continue reading → The post “Digit Recognizer” Challenge on Kaggle using SVM Classification appeared first on...

Read more »

Pivoting Data in R Excel-style

January 2, 2014
By
Pivoting Data in R Excel-style

(This article is referring to an initial proof-of-concept version of r-big-pivot) I have to admit that I very much enjoy pivoting through data using Excel. Its pivoting tool is great for getting a quick insight into a data set’s structure … Continue reading → The post Pivoting Data in R Excel-style appeared first on joy...

Read more »

An intuitive interpretation of the beta distribution

November 15, 2013
By
An intuitive interpretation of the beta distribution

First of all this text is not just about an intuitive perspective on the beta distribution but at least as much about the idea of looking behind a measured empirical probability and thinking of it as a product of chance itself. … Continue reading → The post An intuitive interpretation of the beta distribution appeared first on

Read more »