Random Forest Almighty
[This article was first published on Machine Master, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Random Forests are awesome. They do not overfit, they are easy to tune, they tell you about important variables, they can be used for classification and regression, they are implemented in many programming languages and they are faster than their competitors (neural nets, boosting, support vector machines, …)Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Let us take a moment to appreciate them:
The Random Forest™ is my shepherd; I shall not want.
He makes me watch the mean squared error decrease rapidly.
He leads me beside classification problems.
He restores my soul.
He leads me in paths of the power of ensembles
for his name’s sake.
Even though I walk through the valley of the curse of dimensionality,
I will fear no overfitting,
for you are with me;
your bootstrap and your randomness,
they comfort me.
You prepare a prediction before me
in the presence of complex interactions;
you anoint me data scientist;
my wallet overflows.
Surely goodness of fit and money shall follow me
all the days of my life,
and I shall use Random Forests™
forever.
One thing I learned the hard way was that you should not get to attached to an algorithm for prediction. This probably applies to other areas as well. When I participated in the Observing Dark Worlds challenge, I fell into this trap by sticking to Random Forests. My model performed poorly, but instead of thinking about another algorithm I thought about better features. The winner of this competition used a Bayesian approach.
You can find implementations in R (randomForest package) or in Python (scikit-learn library).
To leave a comment for the author, please follow the link and comment on their blog: Machine Master.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.