This post gives a brief introduction to a workflow of machine learning model and mostly used R packages before diving into the details.
Given a problem to be solved, all machine learning (ML) models use the same input but different output. It is, therefore, useful to understand a common workflow of ML model. As there is no only one workflow but a variety of it, we also introduce one of them.
Construction of ML model starts from a sample splitting. Most commonly used technique is a K-fold cross validation with random shuffling. In case of time-series or panel data, the K-fold cross validation without random shuffling is used for preserving temporal sequence (future data can not be used as a predictor of past data). This method is called as K-fold forward chaining cross validation or forward chaining shortly. Two cross validations are illustrated in the following figures.
Workflow of Machine Learning
Although there are many alternatives for each step, most ML models have the following workflow in common.
Hyperparameters and R packages
R provides many ML packages which are updated irregularly. We use representative time-tested and mostly used R packages for selected some ML models in the following way.
Based on this workflow of ML model, we are going to investigate each ML model and implement it by using R ML packages step by step in a series of next posts. \(\blacksquare\)