Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Since I drive quite a lot, I have some interest in getting the most km out every Euro spent on fuel. One thing to change is the fuel. The oil companies have a premium fuel, which is supposed to be better for both engine and fuel consumption. On the other hand, it is easy to find counter claims which say it is not beneficial for fuel consumption. In that case the extra costs would be a waste. In this post I am creating a setup to check the claims.

Current data

The current information which I have are fuel consumption of last year. I have taken a set of those data from March and April of 2014.
library(MASS)
28.25 710.6
22.93 690.4
28.51 760.5
23.22 697.9
31.52 871.2
24.68 689.6
30.85 826.9
23.04 699
29.96 845.3
30.16 894.7
25.71 696
23.6 669.8
28.57 739
27.23 727.4
18.31 499.9
r1$usage=100*r1$l/r1$km plot(density(r1$usage),
main=’Observed normal diesel usage’,
xlab=’l/100 km’)

The data are from a distribution with a mean around 3.6 l/100 km.
fitdistr(r1\$usage,’normal’)
mean          sd
3.59517971   0.19314598
(0.04828649) (0.03414371)

Approach

Analysis will be a hypothesis test and an estimate of premium diesel usage.
The assumptions which I will make are similar driving patterns and weather as last year. I think that should be possible, given my driving style. A cross check may be made, especially regarding obtaining similar speed. Data with serious traffic jams may be discarded in the analysis.
A check for outliers is not planned. However, obviously faulty data will be corrected or removed from the data. No intermediate analysis is planned, unless data seems to be pointing a marked increase of fuel usage.

Power for hypothesis test

The advice price levels of premium and standard diesel are 1.433 and 1.363 Euro/liter according to the internet. This is about 5% price increase. It should be noted that prices at the pump vary wildly from these values, especially non-brand non-manned fuel stations may be significantly cheaper. Last year’s data was from such non brand fuel. Competition can force the price of both standard and premium fuel down a bit. I will take the 5% price increase as target for finding value for premium diesel. Given significance level of 10% and power of 90%, I come at 17 samples for each group. This means I will have to take a bit more data from last year, which is not a problem. The choice of alpha and beta reflect that I find both kind of errors equally bad.
power.t.test(delta=3.6*.05,
sd=0.2,
sig.level=.1,
power=.9,
alternative=’one.sided’)
Two-sample t test power calculation

n = 16.66118
delta = 0.18
sd = 0.2
sig.level = 0.1
power = 0.9
alternative = one.sided

NOTE: n is number in *each* group

Besides a significance test, I desire an estimate of usage. This manner I can extrapolate the data to other scenarios. I will use a Bayesian analysis to obtain these estimates. The prior to be used is a mixture of three believes. Either it does not make a difference, or there is indeed a 5% gain to be made or something else entirely. This latter is an uninformed prior between 3 and 4 l/km. The combined density is plotted below.
usage <- seq(3.2,3.8,.01)
dens <- (dnorm(usage,3.6,.05)+
dnorm(usage,3.6/1.05,.05)+
dnorm(usage,(3.6+3.6/1.05)/2,.15))/3
plot(x=usage,y=dens,type=’l’,
ylim=c(0,4),
ylab=’density’,
xlab=’l/100 km’,
main=’prior’)

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.