mlr3-0.1.0

Posted on July 30, 2019 by r-bloggers on Machine Learning in R in R bloggers | 0 Comments

[This article was first published on r-bloggers on Machine Learning in R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

mlr3 – Initial release

The mlr-org team is very proud to present the initial release of the mlr3 machine-learning framework for R.

mlr3 comes with a clean object-oriented-design using the R6 class system. With this, it overcomes the limitations of R’s S3 classes. It is a rewrite of the well-known mlr package which provides a convenient way of accessing many algorithms in R through a consistent interface.

While mlr was one big package that included everything starting from preprocessing, tuning, feature-selection to visualization, mlr3 is a package framework consisting of many packages containing specific parts of the functionality required for a complete machine-learning workflow.

Background – why a rewrite?

The addition of many features to mlr has led to a “feature creep” which makes it hard to maintain and extend.

Due to the many tests in mlr, a full CI run of the package on Travis CI takes more than 30 minutes. This does not include the installation of dependencies, running R CMD check or building the pkgdown site (which includes more than 20 vignettes with executable code). The dependency installation alone takes 1 hour(!). This is due to the huge number of packages that mlr imports to be able to use all the algorithms for which it provides a unified interface. On a vanilla CI build without cache there woulöd be 326(!) packages to be installed.

mlr consists of roughly 40k lines of R code.

github.com/AlDanial/cloc v 1.82  T=0.98 s (1335.1 files/s, 226920.4 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
HTML                           346          16592           6276         125687
R                              878           5909          12566          40696
Rmd                             44           2390           5296           2500
Markdown                        11            196              0           1136
XML                              1              0              0            963
YAML                             4             14              5            244
CSS                              2             69             66            203
C                                3             16             33            106
JSON                             5              0              0             98
JavaScript                       2             21              9             80
Bourne Shell                     4              8              6             69
C/C++ Header                     1              7              0             20
SVG                              1              0              1             11
-------------------------------------------------------------------------------
SUM:                          1302          25222          24258         171813
-------------------------------------------------------------------------------

Adding any feature has become a huge pain since it requires passing down objects through many functions. In addition, a change to one part of the project might trigger unwanted side-effects at other places or even break functions. Even though most of the package is covered by tests, this creates a high wall of reluctance to extend the package further.

After mlr has become more prominent, many people started opening issues and pull requests (which is great!). While the package authors and its creator (Bernd Bischl) were able to cope with the amount of user input in the beginning, the number of open issues increased steadily in recent years. Since nobody has been specifically paid for mlr in the past, there was just not enough man power to be able to handle this amount of questions and feature requests in the past. The issue tracker exceeded 400 open issues recently and people often got no reply for their pull requests in a reasonable amount of time. At the same time, the core team could invest less time into mlr due to other new responsibilities in their professional lifes.

mlr was built using the S3 class system. It functionality relied partyl on packages that were also maintained by the mlr-team (e.g. BBmisc, parallelMap). Maintaining so many small packages for a little benefit took away a lot of time. However, back in the days these actions were needed since many convient packages and functions were missing at that point in time.

The mlr-team decided that is was time to change something. Since so many people like the mlr package and its philosophy, we decided to make a fresh start. Taking into account all the problematic points that we learned from in the last years:

The limitation of the S3 class system
The fear of a “feature-creep” in one single package
Long CI checking times
The burden of maintaining too many parts of a “framework” package yourself

The new mlr3 package framework

Uses the new R6 object-oriented class system
Splits functionalities into extension packages; only the core functionality is provided via package mlr3
Outsources parts of the functionality to external packages (parallelization -> future, measures -> Metrics)
Documentation is written in bookdown format which is searchable
Extension packages are assigned to specific persons, reducing reply-times to user requests
comes with a new package mlr3pipelines (the successor of mlrCPO) providing smart operators for creating a complete machine-learning workflow using the piping approach
uses data.table under the hood to speed up internal computations

mlr3 is openly developed on Github for around one year now. Not all functionality of the mlr package is already re-implemented. In fact, this release can be seen as a beta version. We’re happy to take user feedback to improve mlr3 in the next months.

The following list shows only a subset of the mlr3 package framework. For a full list, please visit our wiki. The scope of most packages should be clear simply by their name. For more information, visit the respective Github repos of the packages by clicking on the package name.

Package	Status
mlr3
mlr3book
mlr3db
mlr3filters
mlr3learners
mlr3misc
mlr3ordinal
mlr3pipelines
mlr3spatiotemporal
mlr3survival
mlr3tuning
mlr3viz
paradox

We also started to write a book about mlr3 which aims to serve both as a reference book for machine learning in general and as a manual for mlr3 specifically.

mlr3 at useR!2019

The mlr-org team gave two dedicated talks to present the new mlr3 at the useR! conference in Toulouse.

The recorded videos will be inserted here once they are available.

To leave a comment for the author, please follow the link and comment on their blog: r-bloggers on Machine Learning in R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

mlr3-0.1.0

mlr3 – Initial release

Background – why a rewrite?

The new mlr3 package framework

mlr3 at useR!2019

Related

mlr3 – Initial release

Background – why a rewrite?

The new mlr3 package framework

mlr3 at useR!2019

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)