**everyday analytics**, and kindly contributed to R-bloggers)

#### Introduction

Your average economists. |

#### Background

# Download data

for i in $(seq 1990 2014)

do wget http://www.energy.gov.on.ca/fuelupload/ONTREG$i.csv

done

# Retain the header

head -n 2 ONTREG1990.csv | sed 1d > ONTREG_merged.csv

# Loop over the files and use sed to extract the relevant lines

for i in $(seq 1990 2014)

do

tail -n 15 ONTREG$i.csv | sed 13,15d | sed 's/./-01-'$i',/4' >> ONTREG_merged.csv

done

The WTI data I got from The Federal Reserve Bank of St. Louis, and the forecasts from the US Energy Information Administration.

#### Analysis

Wow, that really is a cliff, isn’t it? The average Ontario Gas Price hasn’t been as low as it was in Dec 2014 since the fall of 2010 (Sep 2010, $1.01) and the WTI not since about May of that year.

Now to the fun stuff. Let’s read the data into R and do some hacky time series analysis.

library(ts)

library(forecast)

# Read in the Ontario Gas Price Data

data <- read.csv(file="ONTREG_merged.csv", header=T, sep=",")

# Read in the WTI oil price data

WTI_data <- read.csv(file='DCOILWTICO.csv',header=F, col.names=c("Date", "Value"))

# Create a time series object for the WTI and Ontario Avg

WTI <- ts(data=WTI_data$Value, frequency=12, start=c(1990,1), end=c(2014,12))

ON <- ts(data=data$ON.Avg, frequency=12, start=c(1990,1), end=c(2014,12))

# Plot and compare

combined <- cbind(WTI, ON)

plot(combined)

We get a comparison plot of the data:

And we can look at the Ontario Gas Price as a function of the WTI. Linear is on the left, log-log on the right.

Next we build lm model objects and look at the diagnostics. I’ll spare the details, but I feel better about the log-log, so we’ll go with that.

# Create linear models (normal and log-log)

l1 <- lm(ON ~ WTI, data=combined)

l2 <- lm(log(ON) ~ log(WTI), data=combined)

# Compare relative performance

summary(l1)

summary(l2)

plot(l1)

plot(l2)

# Plot

plot(ON ~ WTI, data=combined, pch=16, cex=0.3)

abline(l1)

plot(log(ON) ~ log(WTI), data=combined, pch=16, cex=0.3)

abline(l2)

# Read in WTI forecast data

WTI_forecast <- read.csv(file="WTI_forecast.csv", header=F, sep=",", col.names=c("Date", "Value"))

# Forecast Ontario Gas Price

fit <- forecast(l2, newdata=data.frame(WTI=WTI_forecast$Value))

# Unlog

fit$mean <- exp(fit$mean)

fit$lower <- exp(fit$lower)

fit$upper <- exp(fit$upper)

fit$x <- exp(fit$x)

# Plot

plot(fit, ylab='Ontario Average Gas Price (cents/L)')

And there you have it! Putting the forecast data (blue line) and the WTI forecast back into our original graph, we can compare the two:

It’s that easy, right? That’s all there is to it?

#### Conclusion

*i.e.*have constant mean and variance) and not auto-correlated, properties which are almost never true in the real world. Using methods such as above for non-stationary times series can result in what is known as “spurious regression” – finding relationships between variables which don’t really exist, even though the results have high R-squared and p-values.

In these cases, testing the stationary assumption and then massaging of the data (differencing & deseasonalization) is usually required beforehand to handle the problem, or other models which do not have as strict assumptions as linear regression are more appropriate. For more on this, see the excellent book “Forecasting: Principles and Practice” by Rob J. Hyndman and George Athanasopoulos.

#### References & Resources

Fuel Price Data: Regular Unleaded Gasoline 2014 (Ontario Ministry of Energy):

*http://www.energy.gov.on.ca/en/fuel-prices/fuel-price-data/?fuel=REG&yr=2015*

Crude Oil Prices: West Texas Intermediate (WTI) (Federal Reserve Bank of St. Louis)

Short-Term Energy Outlook (U.S. Energy Information Administration)

**leave a comment**for the author, please follow the link and comment on their blog:

**everyday analytics**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...