**Florian Teschner**, and kindly contributed to R-bloggers)

In the last post, I mapped gas stations and gas prices in Germany.

After posting it, I started to look at the dataset from a different angle.

The starting question was; “How can I model gas prices? What are the influencing factors?”

One well known fact is that certain gas station brands demand higher prices.

That is also the case in the present data. All but one of major brands have higher prices than the non-brand (“other”) gas stations.

As an interesting fact, apparently, these brands have their home turfs, as illustrated by the following map.

From the data, we can also calculate the distance between each station. Peter Rosenmai provides a very nice function to do that for all station combinations (2792 * 2792). On my pretty old laptop it takes roughly 30 minutes to calculate the 8 million distances. From that we can count how many stations are within a certain distance from a single station. For example, there are on average ~8 other gas stations within a 5 kilometer radius.

I created three variables, dist5 (5 kilometer radius, dist10, 10 km radius and dist20).

In order to see which of these variables has predictive power on the gas price, we can run a simple “horse-race” regression on the price.

We see that both the 10km and 20km variables carry weight and significance.

That result is also clearly visible by plotting the gas price by the number of competitors within 20km.

The posted gas price decreases from 1.33 to 1.25 for no competition to 100 competitors in close proximity.

One way to cross-validate the result, is to split the gas stations in 4 similar sized groups and check if the result holds. I do that by splitting the stations by their median(latitude/longitude). The map illustrates that.

I use the blue numbers to label/group the stations according to their location.

Then I do the same plot; gas price by the number of competitors within 20km for each quadrant separately.

Overall the results hold. Prices decrease with increasing local competition.

One factor biasing the analysis is that inner-city gas stations are known to have lower prices compared to stations along the “Autobahn”. I somewhat controlled for that by just including gas stations which are positioned at the Autobahn. It might be interesting to re-run the analysis on a wider dataset.

To come back to the starting question; What are the key gas price influencing factors?

We can do a simple model comparison using the awesome BayesFactor package.

The plot shows all models and ranks them according to their BayesFactor (higher is better).

We see that the single most important factor is the number of close competitors in 20km (dist20).

Also, the best model includes dist10 and brand, but excludes dist5.

Hence, the next time you see low gas prices, you know it is because the high regional competition ;).

**leave a comment**for the author, please follow the link and comment on their blog:

**Florian Teschner**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...