Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Like the data hunter he is, my STATWORX colleague Jakob came across a rich data source regarding gas station prices. While his focus has been on checking very common myths about gas prices (check out his blogpost!), he did a fantastic job at cleaning and preparing the raw data to get it in a usable format in the first place. Obviously, I wanted to check whether one can make use of this data and further freely accessible data sources to check gas companies narratives. Let me guide you through my work process!

## Opening the treasure chest of statistics

Since the raw database is remarkably huge (almost 1.5 billion rows), I first decided to aggregate the data into groups, in this case by gas stations. As a result, I’ve got one average gas price per day for each combination of

• fuel sorts: e5, e10, diesel
• brand: nine most important brands and all others
• motorway: are the stations on a motorway yes/no
• 2-digits postal code: regional information where the gas stations are.

This aggregation makes our data frame much handier with approximately 6 million rows left. This is still a lot of information yet allows for actually run models on my local machine.

Treating each group as an individual, for which I have multiple observations over time, is basically like a panel data problem. To find the correct model specifications, I performed multiple tests (Chow, Lagrange multiplier, and a Hausman test). Eventually, the choice fell onto two models:

1. a general linear panel model with two-way fixed effects that allows detecting overall patterns
2. a variable coefficients panel model with individual fixed effects.

The latter model estimates separate coefficients for each group. This allows detecting varying impacts of factors on different regions or gas station types. In both models, each observation is weighted by its group size.

## Let’s bring some covariates into play

Accoring to the Federal Ministry of Finance, the gas prices we have to pay at the gas station are composed of three parts: (I) product purchase price, (II) costs and profit of the oil companies, (III) and taxes. The taxes, namely energy tax and value-added tax (Mehrwertsteuer), remained unchanged during our investigation period. Thus they should merely emphasize yet not influence gas price factors.

The oil price, however, is often blamed for high gas prices. To control its impact, I’ve included daily raw oil prices to my model. Another potential player here is the EUR/USD exchange rate. Like all international trading goods, oil is traded in USD. Differences in the exchange rate might yield changing gas prices regardless of a stable raw oil price.

So far, all factors are beyond the control of gas companies. Let’s look at a topic that got in the public focus in 2018: transportation costs. Due to the hot and dry summer in Germany, some rivers had all-time lows in their levels. That did threaten not only freshwater fish but also the usage of the streams as transportation infrastructure. Especially regions close to Rhine and Elbe receive large shares of their gas supply via these inland waterways. Both very low and very high river levels have a negative impact on inland waterway transport. Low levels force barges to reduce their cargo until transportation has to stop completely. The same is true for very high levels. There is a legal limit above which shipping has to cease. Consequent supply shortages and higher relative transportation costs have a negative impact on our gas prices eventually… or this has been the narrative of gas companies at least. I will check whether this explanation holds true considering the data or if it is instead an excuse to raise prices or keep them on a higher level afterward.

Let’s turn to subjects which might actually be opportunities for gas companies to rip off drivers. One common nuisance are price jumps right at the beginning of school or public holidays. You want to fill up the tank before you drive to the sea – and bam – gas prices reach new highs. The same logic applies to weekdays. As Jakob has shown, it’s cheapest in the middle of the week, namely on Wednesdays and Thursdays, but more expensive on Mondays and Fridays when most commuters refuel. Finally, gas prices might be higher where people are wealthier and therefore less price sensitive. A convenient measurement for this is the purchasing power index (PPI).

## Statistics meets gut feeling – are they on the same page?

To retrieve comparable coefficient estimates, I scaled all continuous independent variables beforehand. The results of the panel model are presented in the next figure. The scaling allows interpreting each variable impact in strength as well as in direction. Confidence intervals are not shown since all estimates are extremely significant, due to a large number of cases.

Unsurprisingly, the raw oil price is the most crucial factor regarding gas prices. Together with the Euro-Dollar exchange rate, these two factors account for most of the variation in gas prices. This relationship can even be seen by a quick look at both normalized time series patterns.

Now let us proceed to the more interesting findings. River levels actually do have a „positive“ impact. Meaning that when inland waterway transportation is hampered, gas prices increase. While this is generally true on the national level, I had a deeper look into more regional details. An interaction between river impacts and the one digit zip code regions (check out the map) reveals that this factor is a price driver mainly in the southern regions of Germany. This seems logical, considering two thoughts. Most gas and oil is transported on the sea and reaches Germany mainly in the north. Therefore, higher transportation costs add up more the longer the transportation route is. And even though I only included levels of the most important inland waterways Rhine and Elbe, they can be seen as sort of a proxy for general river level conditions in Germany. This explains why non-proximal regions are also affected by distant rivers levels.

Turning back to the overall coefficients, the PPI surprisingly has a negative impact. This means where people are wealthier on average, gas prices are lower. The effect of public and school holidays are likewise unexpected. While gas prices are (negligibly) higher during the holidays, they are slightly lower at the beginning and end of the holidays. This is entirely contrary to the gut feeling of most vacationers, who regularly complain about higher prices right at these times. Finally, weekday’s impacts confirm the previous findings from Jakob’s blog. While Thursday is the cheapest day of the week, it’s most expensive on Sundays (reference category and thus not shown in the first coefficient plot above; all weekday impacts are in comparison to Sunday).

## TLDR;

1. Gas prices mainly rely on raw oil prices in Euro (so far, so obvious).
2. Extreme river levels do impact gas prices, yet rather in regions that are far off the sea.
3. Gas prices are not linked to the relative wealth (PPI) of potential customers.
4. Holiday traffic is not used to rip off holidaymakers.
5. Its cheapest in the middle of the week and more expensive close or on the weekend.

## References

#### Matthias Nistler

I am a data scientist at STATWORX and passionate for wrangling data and getting the most out of it. Outside of the office, I use every second for cycling until the sun goes down.

STATWORX
is a consulting company for data science, statistics, machine learning and artificial intelligence located in Frankfurt, Zurich and Vienna. Sign up for our NEWSLETTER and receive reads and treats from the world of data science and AI. If you have questions or suggestions, please write us an e-mail addressed to blog(at)statworx.com.

Der Beitrag Which Factors Influence Gas Prices? Do Gas Companies Narratives Hold True? erschien zuerst auf STATWORX.