[This article was first published on Wiekvoet
, and kindly contributed to R-bloggers
]. (You can report issue about the content on this page here
Want to share your content on R-bloggers? click here
if you have a blog, or here
if you don't.
Since I drive quite a lot, I have some interest in getting the most km out every Euro spent on fuel. One thing to change is the fuel. The oil companies have a premium fuel, which is supposed to be better for both engine and fuel consumption. On the other hand, it is easy to find counter claims which say it is not beneficial for fuel consumption. In that case the extra costs would be a waste. In this post I am creating a setup to check the claims.
The current information which I have are fuel consumption of last year. I have taken a set of those data from March and April of 2014.
r1 <- read.table(text='l km
main=’Observed normal diesel usage’,
The data are from a distribution with a mean around 3.6 l/100 km.
Analysis will be a hypothesis test and an estimate of premium diesel usage.
The assumptions which I will make are similar driving patterns and weather as last year. I think that should be possible, given my driving style. A cross check may be made, especially regarding obtaining similar speed. Data with serious traffic jams may be discarded in the analysis.
A check for outliers is not planned. However, obviously faulty data will be corrected or removed from the data. No intermediate analysis is planned, unless data seems to be pointing a marked increase of fuel usage.
Power for hypothesis test
The advice price levels of premium and standard diesel are 1.433 and 1.363 Euro/liter according to the internet. This is about 5% price increase. It should be noted that prices at the pump vary wildly from these values, especially non-brand non-manned fuel stations may be significantly cheaper. Last year’s data was from such non brand fuel. Competition can force the price of both standard and premium fuel down a bit. I will take the 5% price increase as target for finding value for premium diesel. Given significance level of 10% and power of 90%, I come at 17 samples for each group. This means I will have to take a bit more data from last year, which is not a problem. The choice of alpha and beta reflect that I find both kind of errors equally bad.
Two-sample t test power calculation
n = 16.66118
delta = 0.18
sd = 0.2
sig.level = 0.1
power = 0.9
alternative = one.sided
NOTE: n is number in *each* group
Estimating usage of premium diesel
Besides a significance test, I desire an estimate of usage. This manner I can extrapolate the data to other scenarios. I will use a Bayesian analysis to obtain these estimates. The prior to be used is a mixture of three believes. Either it does not make a difference, or there is indeed a 5% gain to be made or something else entirely. This latter is an uninformed prior between 3 and 4 l/km. The combined density is plotted below.
usage <- seq(3.2,3.8,.01)
dens <- (dnorm(usage,3.6,.05)+