Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

## Introduction

Many people know that nasty feeling when buying a brand new car. The minute that you have left the dealer, your car has lost a substantial amount of value. Unfortunately this depreciation is inevitable, however, the amount depends heavily on the car make and model. A small analysis of data from (used) cars shows these differences.

## Car Data

I have used Rvest to scrape data from www.gaspedaal.nl, a Dutch website that combines car for sales data from several other sites. The script to get the data is not that difficult, it can be found on my GitHub, together with my analysis script. There are around 435,000 cars. The data for each car consists of: make, model, price, fuel type, transmission and age. There are many different car makes and models, the most occurring cars in my data set are:

### Car age vs. Kilometers

Obviously, there is a clear relation between the age of a car and the amount of kilometers driven. An interesting pattern to see is that this relation depends on on the car make (and model). The following figure shows a few car brands.

Large differences in amount of driving between car types start after 18 months. On average, Jaguars are not made for driving, after 60 months only around 83.000 KM are driven by its owners. While on the other hand, Mercedes-Benz owners have driven around 120.000 KM after 60 months.

A more extreme difference is between the Volvo V50 and the Hyundai i10. Between six and ten years, a Volvo V50 has driven on average 178K kilometers while a Hyundai i10 has driven only 75K kilometers.

## Depreciation

A simple depreciation model is just linear depreciation. Per car brand, model, and transmission type, I can fit a straight line through price and kilometers driven. The slope of the line is the depreciation for every kilometer driven. An elegant way to obtain the depreciation per car type is by using the purrr and broom packages.

First, some outlying values are removed then only car types with enough data points are considered. Then I have grouped the data by brand, model and transmission type, so that for each group a simple linear regression model can be fitted:

Price = Intercept + depreciation * KM

The following table shows the results:

So, on average a new Porsche 911 costs 117,278.60 Euro, and every kilometer you drive will cost you around 49.75 cents in loss of value. The complete table with all car types can be found on RPubs. Although, simple and easy to interpret parameters, a straight line model is not a realistic model as can be seen in the following figure:

A better model to fit would be a non linear depreciation model. For example, exponential depreciation or if you don’t want to specify a specific function, some kind of smoothing spline. The R code only needs to be modified slightly, to code below fits a natural cubic splines per car type.

It is a better model (in terms of R-squared), it follows the non linear depreciation that we can see in the data. However, we do not have a single deprecation value. How much value a certain car will lose when driving 1 kilometer now depends on the amount of kilometers driven. It is the derivative of the fitted spline curve. For example, the spline curves fitted for a Renault Clio are given in the figure below. A Clio with automatic transmission hardly looses any value after 100,000 KM.

I have created a small shiny app so that you can see the curves of all the car types.

## Conclusion

Despite my data science exercise and beautiful natural cubic smoothing splines models, buying a brand new car involves a lot of emotion. My wife wants a blue Citroen C4 Picasso, no matter what cubic spline model and R-squared I show to her!

So just ignore my analysis and buy the car that feels good to you!! Cheers, Longhow.