Most Popular Learners in mlr

[This article was first published on mlr-org, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

For the development of mlr as well as for an “machine learning expert” it can be handy to know what are the most popular learners used.
Not necessarily to see, what are the top notch performing methods but to see what is used “out there” in the real world.
Thanks to the nice little package cranlogs from metacran you can at least get a slight estimate as I will show in the following…

First we need to install the cranlogs package using devtools:


Now let’s load all the packages we will need:


Do obtain a neat table of all available learners in mlr we can call listLearners().
This table also contains a column with the needed packages for each learner separated with a ,.

# obtain used packages for all learners
lrns =
all.pkgs = stri_split(lrns$package, fixed = ",")

Note: You might get some warnings here because you likely did not install all packages that mlr suggests – which is totally fine.

Now we can obtain the download counts from the rstudio cran mirror, i.e. from the last month.
We use data.table to easily sum up the download counts of each day.

all.downloads = cran_downloads(packages = unique(unlist(all.pkgs)), when = "last-month")
all.downloads =
monthly.downloads = all.downloads[, list(monthly = sum(count)), by = package]

As some learners need multiple packages we will use the download count of the package with the least downloads.

lrn.downloads = sapply(all.pkgs, function(pkgs) {
  monthly.downloads[package %in% pkgs, min(monthly)]

Let’s put these numbers in our table:

lrns$downloads = lrn.downloads
lrns = lrns[order(downloads, decreasing = TRUE),]
lrns[, .(class, name, package, downloads)]

Here are the first 5 rows of the table:

surv.coxphCox Proportional Hazard Modelsurvival153681
classif.naiveBayesNaive Bayese1071102249
classif.svmSupport Vector Machines (libsvm)e1071102249
regr.svmSupport Vector Machines (libsvm)e1071102249
classif.ldaLinear Discriminant AnalysisMASS55852

Now let’s get rid of the duplicates introduced by the distinction of the type classif, regr and we already have our…

nearly final table

lrns.small = lrns[, .SD[1,], by = .(name, package)]
lrns.small[, .(class, name, package, downloads)]

The top 20 according to the rstudio cran mirror:

surv.coxphCox Proportional Hazard Modelsurvival153681
classif.naiveBayesNaive Bayese1071102249
classif.svmSupport Vector Machines (libsvm)e1071102249
classif.ldaLinear Discriminant AnalysisMASS55852
classif.qdaQuadratic Discriminant AnalysisMASS55852
classif.randomForestRandom ForestrandomForest52094
classif.gaussprGaussian Processeskernlab44812
classif.ksvmSupport Vector Machineskernlab44812
classif.lssvmLeast Squares Support Vector Machinekernlab44812
cluster.kkmeansKernel K-Meanskernlab44812
regr.rvmRelevance Vector Machinekernlab44812
classif.cvglmnetGLM with Lasso or Elasticnet Regularization (Cross Validated Lambda)glmnet41179
classif.glmnetGLM with Lasso or Elasticnet Regularizationglmnet41179
surv.cvglmnetGLM with Regularization (Cross Validated Lambda)glmnet41179
surv.glmnetGLM with Regularizationglmnet41179
classif.cforestRandom forest based on conditional inference treesparty36492
classif.ctreeConditional Inference Treesparty36492
regr.cforestRandom Forest Based on Conditional Inference Treesparty36492
regr.mobModel-based Recursive Partitioning Yielding a Tree with Fitted Models Associated with each Terminal Nodeparty,modeltools36492
surv.cforestRandom Forest based on Conditional Inference Treesparty,survival36492

As we are just looking for the packages let’s compress the table a bit further and come to our…

final table

lrns[,list(learners = paste(class, collapse = ",")),by = .(package, downloads)]

Here are the first 20 rows of the table:

e1071102249classif.naiveBayes, classif.svm, regr.svm
MASS55852classif.lda, classif.qda
randomForest52094classif.randomForest, regr.randomForest
kernlab44812classif.gausspr, classif.ksvm, classif.lssvm, cluster.kkmeans, regr.gausspr, regr.ksvm, regr.rvm
glmnet41179classif.cvglmnet, classif.glmnet, regr.cvglmnet, regr.glmnet, surv.cvglmnet, surv.glmnet
party36492classif.cforest, classif.ctree, multilabel.cforest, regr.cforest, regr.ctree
rpart28609classif.rpart, regr.rpart, surv.rpart
RWeka20583classif.IBk, classif.J48, classif.JRip, classif.OneR, classif.PART, cluster.Cobweb, cluster.EM, cluster.FarthestFirst, cluster.SimpleKMeans, cluster.XMeans, regr.IBk
gbm19554classif.gbm, regr.gbm, surv.gbm
nnet19538classif.multinom, classif.nnet, regr.nnet
pls18106regr.pcr, regr.plsr
FNN16107classif.fnn, regr.fnn
class14493classif.knn, classif.lvq1


This is not really representative of how popular each learner is, as some packages have multiple purposes (e.g. multiple learners).
Furthermore it would be great to have access to the trending list.
Also most stars at GitHub gives a better view of what the developers are interested in.
Looking for machine learning packages we see there e.g: xgboost, h2o and tensorflow.

To leave a comment for the author, please follow the link and comment on their blog: mlr-org. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)