**sweissblaug**, and kindly contributed to R-bloggers)

My previous post discussed methods to uncover effects of a particular explanatory variable on a response variable in machine learning models. These work by changing the variable of interest and measure how much the change in output, is keeping all other variables constant. However, this assumption that we can hold constant other variables is almost surely incorrect. In most applications, changing one variable changes the distributions of other covariates as well.

In this post I will show the effect of a change in values by simulating a dataset, keeping only variables of interest fixed while allowing all other variables to change. I will do this by simulating data points from a method described here. I’ll compare this method with existing techniques.

The topic I’m going to explore is whether turbocharged engines actually decrease fuel economy. Over the past few years turbocharging is increasingly used for engines that reduce engine size while increasing power and fuel economy and reducing emissions. It has been hailed as the solution to increasing power while decreasing fuel consumption. A common theme among proponents is that one can achieve V-6 power through turbocharging I-4 engines and thereby achieve I-4 fuel consumption. However, real world driving has suggested that a turbocharged option doesn’t necessarily reduce consumption, it merely gives you the option to reduce consumption if you don’t use all the power.

Data:

The car data was scraped from car and driver website and included 416 car reviews with “observed mpg” along with 15 other variables including; weight, horsepower, torque, zero to 60 time, etc.

I’ll compare how observed fuel economy compares with a turbo I-4 and naturally aspirated V-6 with the same 0 to 60 time (6.5 seconds). The two comparisons are:

I picked these examples because changing from V-6 to a I-4 is a very common choice and 0-60 of 6.5 is a fairly standard time for a mid range family sedan.

My previous post discussed the methodology of using everything as constant. In this case I will predict each observation twice under the differences (one with turbo 4 and one with naturally aspirated v-6) and compare the predictions.

Where f() is approximated by a machine learning method. In this case it is a random Forest. Below is a histogram of the results.

Using this method, the mean change in observed mpg is -.028. In addition there is a wide variation in changes (1st Qu is -.077 and 3Qu is 0.025). This results appear that turbos have much of an effect on fuel consumption.

However, it’s clear that changing the engine from a I-4 Turbo to a V-6 naturally aspirated engine could change other variables as well. For instance, I-4 engines are lighter than V-6 engines (2 fewer cylinders after all) so its possible that changing from Turbo Regime to Naturally Aspirated regime decreases fuel consumption through other channels.

From this post, we can simulate the distributions:

Below is a pairs plot of several variables under each of the different regimes.

The mean mpg observed under I-4 Turbo regime is 25.57 and is 24.29 for V-6 indicating that I-4 Turbos are, on average, 1.28 mpg more fuel efficient than similar performing V-6. It would appear that this is due, at least partially, to weight decrease of a turbo engine. That is, turbo engines are associated with lighter cars (on average by 150 pounds) and that reduces fuel consumption.

Conclusion:

This post compared existing methods of predicting changes with one that simulates the distribution under different conditions. The resulting distribution allows other covariates to change an expected amount given a change in other variables. Using existing methods found no change in average fuel consumption because the gain in fuel efficiency isn’t directly caused by turbocharging an engine but through other channels like decrease in weight.

**leave a comment**for the author, please follow the link and comment on their blog:

**sweissblaug**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...