Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post gives a brief introduction to a workflow of machine learning model and mostly used R packages before diving into the details.

Given a problem to be solved, all machine learning (ML) models use the same input but different output. It is, therefore, useful to understand a common workflow of ML model. As there is no only one workflow but a variety of it, we also introduce one of them.

### Sample Splitting

Construction of ML model starts from a sample splitting. Most commonly used technique is a K-fold cross validation with random shuffling. In case of time-series or panel data, the K-fold cross validation without random shuffling is used for preserving temporal sequence (future data can not be used as a predictor of past data). This method is called as K-fold forward chaining cross validation or forward chaining shortly. Two cross validations are illustrated in the following figures.

### Workflow of Machine Learning

Although there are many alternatives for each step, most ML models have the following workflow in common.

### Hyperparameters and R packages

R provides many ML packages which are updated irregularly. We use representative time-tested and mostly used R packages for selected some ML models in the following way.
Here, names of selected ML models include Logistic Regression (LR), Decision Tree (DT), Support Vector Machine (SVM), Random Forest (RF), Artificial Neural Network (ANN), Gradient Boosting (GBoost) and Extreme Gradient Boosting (XGBoost). Numerical values for hyperparameters of each ML model are presented as examples and are not absolute.

### Concluding Remarks

Based on this workflow of ML model, we are going to investigate each ML model and implement it by using R ML packages step by step in a series of next posts. $$\blacksquare$$