Have you ever spent the whole day applying different Machine Learning (ML) algorithms from several libraries, coping with a legion of conditional and unconditional hyperparameters, and attempting to deal with different data processing techniques? Undeniably, it is a try-and-error work with prerequisite experience. These repeatable trials can be very time-consuming and arduous, even for skilled ML practitioners.
To tackle those aforementioned problems, with the rapidity in building models and a user-friendly interface without requiring any advanced prior knowledge in Data Science; the AutoML R forester package has become a shining candidate to be considered.
Why are AutoML packages so important?
ML applications have been crystallized in many corners of our life nowadays, ranging from fraud detection in business, image recognition in social networks, auto-driven cars to personalized medicine in healthcare, etc. This ever-growing demand has led to the rise of ML systems that can work instantly, effectively, and automatically with minimum human effort — so-called Automated Machine Learning (AutoML).
Automated Machine Learning is the process of automating fully end-to-end the tasks of applying machine learning to real-world problems. The high degree of automation of AutoML allows non-experts to use and deploy the models with no or little prior knowledge.
There are a number of existing autoML libraries, which cover stages in the ML problem such as mlr3, caret, or H2O. In spite of that, the different syntaxes and requirements for a specific data object from those libraries may result in another daunting process of reading lengthy documentation. In addition, the usage of different libraries may prevent users from synchronizing different parts in ML workflow as expected.
To minimize all of the minute drawbacks prior mentioned, with the motto:
“Effectiveness, rapidity, user-friendliness and full coverage ML workflow”
we would like to introduce the forester package.
The forester package automatically capsulizes important steps in the ML pipeline: preprocessing data, feature engineering, creating model, hyperparameter optimization, model evaluation, and importantly, explaining by connecting with DALEX package to increase the credibility in deploying best models. Innovative ideas in forester package are:
- No requirements for data – There is no need to create a particular object for each model. The package deals with common data structures, such as data frames, matrices, data tables. It performs feature engineering so the users do not have to.
- Simple user interface – One function with three mandatory parameters is all it takes to create the model.
- Automatic hyperparameter optimization – Besides having the trained model, a tuple of hyperparameters will be automatically optimized and selected.
- Comparing and selecting the best model – forester package is able to make comparisons between built models and choose the best one for a specified metric.
- Providing explanations – Explanation plays a crucial role in eliminating reluctance and increasing trust for decision-makers while using the model’s results. The integration with DALEX package, forester enables users to create explanations at both local and global levels.
An example of forester usage
data("titanic", package = 'DALEX') best_model <- forester(data = titanic, target = "survived", type = "classification", metric = "precision", tune = FALSE)
For further deeper and advantageous usages of forester package, stay tuned for part 2 in this series: forester: An AutoML R package for Tree-based Models.
The source code and detailed description of our package are available at: https://github.com/ModelOriented/forester
If you are interested in other posts about explainable, fair, and responsible ML, follow #ResponsibleML on Medium.
forester: An AutoML R package for Tree-based Models was originally published in ResponsibleML on Medium, where people are continuing the conversation by highlighting and responding to this story.