Does imputing model labels using the model predictions can improve it’s performance?

December 21, 2018

In some scenarios a data scientist may want to train a model for which there exists an abundance of observations, but only a small fraction of is labeled, making the sample size available to train the model rather small. Although there’s plenty of literature on the subject (e.g. “Active learning”, “Semi-supervised learning” etc) one may … Continue reading Does...

Reproducible development with Rmarkdown and Github

September 21, 2018

I’m pretty sure most readers of this blog are already familiar with Rmarkdown and Github. In this post I don’t pretend to invent the wheel but rather give a quick run-down of how I set-up and use these tools to produce high quality and scalable (in human time) reproducible data science development code. Github While … Continue reading Reproducible...

