# Blog Archives

## RuleFit: When disassembled trees meet Lasso

October 3, 2015
By

The RuleFit algorithm from Friedman and Propescu is an interesting regression and classification approach that uses decision rules in a linear model.RuleFit is not a completely new idea, but it combines a bunch of algorithms in a clever way. RuleFit consists of two components: The first component produces "rules" and the second component fits a linear model with these rules as input...

## Random Forest Almighty

February 6, 2014
By

Random Forests are awesome. They do not overfit, they are easy to tune, they tell you about important variables, they can be used for classification and regression, they are implemented in many programming languages and they are faster than their competitors (neural nets, boosting, support vector machines, ...)Let us take a moment to appreciate them: The...

## From OpenOffice noob to control freak: A love story with R, LaTeX and knitr

March 8, 2013
By

Lately I had to write a seminar paper for a class and I decided to overdo it.But let's start at the very beginning. Here is my evolution of how I used to write stuff and how I got from this:to that:School: OpenOffice - I guess everyone has some&nb...

## Misusage of the new shiny package: A nerdy drink tracker for your next party

December 30, 2012
By

Currently a lot of people are talking about the new shiny package. So I got curious and built an own, more or less useful app: A drink trackerThis app can be used to track how much someone drank and therefore it is very useful for every party, especial...

## Get the party started

December 22, 2012
By

Have you already used trees or random forests to model a relationship of a response and some covariates? Then you might like the condtional trees, which are implemented in the party package.In difference to the CART (Classification and Regression ...

## Trees with the rpart package

November 13, 2012
By

What are trees? Trees (also called decision trees, recursive partitioning) are a simple yet powerful tool in predictive statistics. The idea is to split the covariable space into many partitions and to fit a constant model of the response variable in each partition. In case of regression, the mean...

## PCA or Polluting your Clever Analysis

August 31, 2012
By

When I learned about principal component analysis (PCA), I thought it would be really useful in big data analysis, but that's not true if you want to do prediction. I tried PCA in my first competition at kaggle, but it delivered bad results. This post illustrates how PCA can pollute good predictors.When I started examining this problem,...