R Tensorflow Multiple Linear Regression

Posted on August 27, 2019 by Ian Johnson in R bloggers | 0 Comments

[This article was first published on Data Science, Data Mining and Predictive Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In the previous three posts I used multiple linear regression, decision trees, gradient boosting, and support vector machine to predict miles per gallon for 2019 vehicles. It was determined that svm produced the best model. In this post, I am going to run TensorFlow through R and fit a multiple linear regression model using the same data to predict MPG.

Part 1: Multiple Linear Regression using R

There are 1253 vehicles in the cars_19 dataset. I am simply running mlr using Tensorflow for demonstrative purposes as using lm() in R is more efficient and more precise for such a small dataset.

TensorFlow uses an algorithm that is dependent upon convergence whereas R computes the closed form estimates of beta. I will be using 11 features and an intercept so R will be inverting a 12 x 12 matrix which is not computationally expensive with today’s technology.

The dataset below of 11 features contains 7 factor variables and 4 numeric variables.

str(cars_19)
'data.frame':    1253 obs. of  12 variables:
 $ fuel_economy_combined: int  21 28 21 26 28 11 15 18 17 15 ...
 $ eng_disp             : num  3.5 1.8 4 2 2 8 6.2 6.2 6.2 6.2 ...
 $ num_cyl              : int  6 4 8 4 4 16 8 8 8 8 ...
 $ transmission         : Factor w/ 7 levels "A","AM","AMS",..: 3 2 6 3 6 3 6 6 6 5 ...
 $ num_gears            : int  9 6 8 7 8 7 8 8 8 7 ...
 $ air_aspired_method   : Factor w/ 5 levels "Naturally Aspirated",..: 4 4 4 4 4 4 3 1 3 3 ...
 $ regen_brake          : Factor w/ 3 levels "","Electrical Regen Brake",..: 2 1 1 1 1 1 1 1 1 1 ...
 $ batt_capacity_ah     : num  4.25 0 0 0 0 0 0 0 0 0 ...
 $ drive                : Factor w/ 5 levels "2-Wheel Drive, Front",..: 4 2 2 4 2 4 2 2 2 2 ...
 $ fuel_type            : Factor w/ 5 levels "Diesel, ultra low sulfur (15 ppm, maximum)",..: 4 3 3 5 3 4 4 4 4 4 ...
 $ cyl_deactivate       : Factor w/ 2 levels "N","Y": 1 1 1 1 1 2 1 2 2 1 ...
 $ variable_valve       : Factor w/ 2 levels "N","Y": 2 2 2 2 2 2 2 2 2 2 ...

The factors need to be transformed into a format TensorFlow can understand.

cols <- feature_columns(
  column_numeric(colnames(cars_19[c(2, 3, 5, 8)])),
  column_categorical_with_identity("transmission", num_buckets = 7),
  column_categorical_with_identity("air_aspired_method", num_buckets = 5),
  column_categorical_with_identity("regen_brake", num_buckets = 3),
  column_categorical_with_identity("drive", num_buckets = 5),
  column_categorical_with_identity("fuel_type", num_buckets = 5),
  column_categorical_with_identity("cyl_deactivate", num_buckets = 2),
  column_categorical_with_identity("variable_valve", num_buckets = 2)
  )

Create an input function:

#input_fn for a given subset of data
cars_19_input_fn <- function(data, num_epochs = 1) {
  input_fn(
    data,
    features = colnames(cars_19[c(2:12)]),
    response = "fuel_economy_combined",
    batch_size = 64,
    num_epochs = num_epochs
  )
}

Train, evaluate, predict:

model <- linear_regressor(feature_columns = cols)

set.seed(123)
indices <- sample(1:nrow(cars_19), size = 0.75 * nrow(cars_19))
train <- cars_19[indices, ]
test <- cars_19[-indices, ]

#train model
model %>% train(cars_19_input_fn(train, num_epochs = 1000))

#evaluate model
model %>% evaluate(cars_19_input_fn(test))

#predict
yhat <- model %>% predict(cars_19_input_fn(test))

Results are very close to the R closed form estimates:

postResample(yhat, y)
     RMSE  Rsquared       MAE 
2.5583185 0.7891934 1.9381757

To leave a comment for the author, please follow the link and comment on their blog: Data Science, Data Mining and Predictive Analytics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

R Tensorflow Multiple Linear Regression

Related

Related

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)