Blog Archives

Random Forest Almighty

February 6, 2014
By
Random Forest Almighty

Random Forests are awesome. They do not overfit, they are easy to tune, they tell you about important variables, they can be used for classification and regression, they are implemented in many programming languages and they are faster than their competitors (neural nets, boosting, support vector machines, ...)Let us take a moment to appreciate them: The...

Read more »

From OpenOffice noob to control freak: A love story with R, LaTeX and knitr

March 8, 2013
By
From OpenOffice noob to control freak: A love story with R, LaTeX and knitr

Lately I had to write a seminar paper for a class and I decided to overdo it.But let's start at the very beginning. Here is my evolution of how I used to write stuff and how I got from this:to that:School: OpenOffice - I guess everyone has some&nb...

Read more »

Misusage of the new shiny package: A nerdy drink tracker for your next party

December 30, 2012
By
Misusage of the new shiny package: A nerdy drink tracker for your next party

Currently a lot of people are talking about the new shiny package. So I got curious and built an own, more or less useful app: A drink trackerThis app can be used to track how much someone drank and therefore it is very useful for every party, especial...

Read more »

Get the party started

December 22, 2012
By

Have you already used trees or random forests to model a relationship of a response and some covariates? Then you might like the condtional trees, which are implemented in the party package.In difference to the CART (Classification and Regression ...

Read more »

Trees with the rpart package

November 13, 2012
By
Trees with the rpart package

What are trees? Trees (also called decision trees, recursive partitioning) are a simple yet powerful tool in predictive statistics. The idea is to split the covariable space into many partitions and to fit a constant model of the response variable in each partition. In case of regression, the mean...

Read more »

PCA or Polluting your Clever Analysis

August 31, 2012
By
PCA or Polluting your Clever Analysis

When I learned about principal component analysis (PCA), I thought it would be really useful in big data analysis, but that's not true if you want to do prediction. I tried PCA in my first competition at kaggle, but it delivered bad results. This post illustrates how PCA can pollute good predictors.When I started examining this problem,...

Read more »