Multi-Output Regression using Sklearn

[This article was first published on R – Hi! I am Nagdev, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Regression analysis is a process of building a linear or non-linear fit for one or more continuous target variables. That’s right! there can be more than one target variable. Multi-output machine learning problems are more common in classification than regression. In classification, the categorical target variables are encoded to convert them to multi-output. In my professional experience, I see about 90% of the data science regression problems usually have a single target variable and the rest usually require fitting for multiple target variables. Some applications for multi-output target variable problems are in forecasting and predictive maintenance.

In the next couple of sections, let me walk you through, how to solve multi-output regression problems using sklearn.

1. Import packages

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import RandomForestRegressor

There are few packages that we would be loading here

  1. make_regression: to create a regression dataset
  2. train_test_split: to split the data into train and test
  3. MultiOutputRegressor: to create a multioutput regressor
  4. RandomForestRegressor: To build a random forest regressor model

2. Create a multi-output regressor

x, y = make_regression(n_targets=3)

Here we are creating a random dataset for a regression problem. We will create three target variables and keep the rest of the parameters to default. The below will show the shape of our features and target variables.

x.shape
y.shape

3. Split data into train and test

The following block of code will spit our features and target variables into train and test split. Our train set will have 70% of the features and the test will have 30% of the features.

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=42)

4. Model building

Next, we can train our multi-output regression model using the below code.

According to the sklearn package, “This strategy consists of fitting one regressor per target. This is a simple strategy for extending regressors that do not natively support multi-target regression“.

clf = MultiOutputRegressor(RandomForestRegressor(max_depth=2, random_state=0))
clf.fit(x_train, y_train)

5. Prediction and scoring

The following block of code will perform prediction for the first test observation and calculates the coefficient of determination of the prediction. Since the dataset is a randomly created data set, we cannot expect it to have a good R2 value.

clf.predict(x_test[[0]])
clf.score(x_test, y_test, sample_weight=None)

Putting everything togather

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split
from sklearn.multioutput import MultiOutputRegressor
from sklearn.ensemble import RandomForestRegressor

# create regression data
x, y = make_regression(n_targets=3)

# split into train and test data
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.30, random_state=42)

# train the model
clf = MultiOutputRegressor(RandomForestRegressor(max_depth=2, random_state=0))
clf.fit(x_train, y_train)

# predictions
clf.predict(x_test)

Finally, we can put all the code together and as you can see with few lines of code, one can easily build a multi-output regression model using sklearn. In my next tutorial, I will show you how to do multi-output regression using deep learning and the Keras package.

Hope you enjoyed this tutorial. Feel free to drop the comments about this tutorial.

The post Multi-Output Regression using Sklearn appeared first on Hi! I am Nagdev.

To leave a comment for the author, please follow the link and comment on their blog: R – Hi! I am Nagdev.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)