mlr3-0.1.0
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
mlr3 – Initial release
The mlr-org team is very proud to present the initial release of the mlr3 machine-learning framework for R.
mlr3 comes with a clean object-oriented-design using the R6 class system. With this, it overcomes the limitations of R’s S3 classes. It is a rewrite of the well-known mlr package which provides a convenient way of accessing many algorithms in R through a consistent interface.
While mlr was one big package that included everything starting from preprocessing, tuning, feature-selection to visualization, mlr3 is a package framework consisting of many packages containing specific parts of the functionality required for a complete machine-learning workflow.
Background – why a rewrite?
The addition of many features to mlr has led to a “feature creep” which makes it hard to maintain and extend.
Due to the many tests in mlr, a full CI run of the package on Travis CI takes more than 30 minutes. This does not include the installation of dependencies, running R CMD check or building the pkgdown site (which includes more than 20 vignettes with executable code). The dependency installation alone takes 1 hour(!). This is due to the huge number of packages that mlr imports to be able to use all the algorithms for which it provides a unified interface. On a vanilla CI build without cache there woulöd be 326(!) packages to be installed.
mlr consists of roughly 40k lines of R code.
github.com/AlDanial/cloc v 1.82 T=0.98 s (1335.1 files/s, 226920.4 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- HTML 346 16592 6276 125687 R 878 5909 12566 40696 Rmd 44 2390 5296 2500 Markdown 11 196 0 1136 XML 1 0 0 963 YAML 4 14 5 244 CSS 2 69 66 203 C 3 16 33 106 JSON 5 0 0 98 JavaScript 2 21 9 80 Bourne Shell 4 8 6 69 C/C++ Header 1 7 0 20 SVG 1 0 1 11 ------------------------------------------------------------------------------- SUM: 1302 25222 24258 171813 -------------------------------------------------------------------------------
Adding any feature has become a huge pain since it requires passing down objects through many functions. In addition, a change to one part of the project might trigger unwanted side-effects at other places or even break functions. Even though most of the package is covered by tests, this creates a high wall of reluctance to extend the package further.
After mlr has become more prominent, many people started opening issues and pull requests (which is great!). While the package authors and its creator (Bernd Bischl) were able to cope with the amount of user input in the beginning, the number of open issues increased steadily in recent years. Since nobody has been specifically paid for mlr in the past, there was just not enough man power to be able to handle this amount of questions and feature requests in the past. The issue tracker exceeded 400 open issues recently and people often got no reply for their pull requests in a reasonable amount of time. At the same time, the core team could invest less time into mlr due to other new responsibilities in their professional lifes.
mlr was built using the S3 class system. It functionality relied partyl on packages that were also maintained by the mlr-team (e.g. BBmisc, parallelMap). Maintaining so many small packages for a little benefit took away a lot of time. However, back in the days these actions were needed since many convient packages and functions were missing at that point in time.
The mlr-team decided that is was time to change something. Since so many people like the mlr package and its philosophy, we decided to make a fresh start. Taking into account all the problematic points that we learned from in the last years:
- The limitation of the S3 class system
- The fear of a “feature-creep” in one single package
- Long CI checking times
- The burden of maintaining too many parts of a “framework” package yourself
The new mlr3 package framework
- Uses the new R6 object-oriented class system
- Splits functionalities into extension packages; only the core functionality is provided via package mlr3
- Outsources parts of the functionality to external packages (parallelization -> future, measures -> Metrics)
- Documentation is written in bookdown format which is searchable
- Extension packages are assigned to specific persons, reducing reply-times to user requests
- comes with a new package mlr3pipelines (the successor of mlrCPO) providing smart operators for creating a complete machine-learning workflow using the piping approach
- uses data.table under the hood to speed up internal computations
mlr3 is openly developed on Github for around one year now. Not all functionality of the mlr package is already re-implemented. In fact, this release can be seen as a beta version. We’re happy to take user feedback to improve mlr3 in the next months.
The following list shows only a subset of the mlr3 package framework. For a full list, please visit our wiki. The scope of most packages should be clear simply by their name. For more information, visit the respective Github repos of the packages by clicking on the package name.
Package | Status |
---|---|
mlr3 | |
mlr3book | |
mlr3db | |
mlr3filters | |
mlr3learners | |
mlr3misc | |
mlr3ordinal | |
mlr3pipelines | |
mlr3spatiotemporal | |
mlr3survival | |
mlr3tuning | |
mlr3viz | |
paradox |
We also started to write a book about mlr3 which aims to serve both as a reference book for machine learning in general and as a manual for mlr3 specifically.
mlr3 at useR!2019
The mlr-org team gave two dedicated talks to present the new mlr3 at the useR! conference in Toulouse.
The recorded videos will be inserted here once they are available.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.