**R – Longhow Lam's Blog**, and kindly contributed to R-bloggers)

## Introduction

Many people know that **nasty feeling** when buying a brand new car. The minute that you have left the dealer, your car has lost a substantial amount of value. Unfortunately this depreciation is inevitable, however, the amount depends heavily on the car make and model. A small analysis of data from (used) cars shows these differences.

## Car Data

I have used Rvest to scrape data from www.gaspedaal.nl, a Dutch website that combines car for sales data from several other sites. The script to get the data is not that difficult, it can be found on my GitHub, together with my analysis script. There are around **435,000 **cars. The data for each car consists of: make, model, price, fuel type, transmission and age. There are many different car makes and models, the most occurring cars in my data set are:

**Car age vs. Kilometers**

Obviously, there is a clear relation between the age of a car and the amount of kilometers driven. An interesting pattern to see is that this relation depends on on the car make (and model). The following figure shows a few car brands.

Large differences in amount of driving between car types start after 18 months. On average, **Jaguars **are not made for driving, after 60 months only around **83.000 KM** are driven by its owners. While on the other hand, **Mercedes-Benz **owners have driven around **120.000 KM** after 60 months.

A more extreme difference is between the Volvo V50 and the Hyundai i10. Between six and ten years, a Volvo V50 has driven on average 178K kilometers while a Hyundai i10 has driven only 75K kilometers.

## Depreciation

A simple depreciation model is just linear depreciation. Per car brand, model, and transmission type, I can fit a straight line through price and kilometers driven. The slope of the line is the depreciation for every kilometer driven. An elegant way to obtain the depreciation per car type is by using the **purrr **and **broom **packages.

First, some outlying values are removed then only car types with *enough *data points are considered. Then I have grouped the data by brand, model and transmission type, so that for each group a simple linear regression model can be fitted:

Price = *Intercept* + *depreciation* * KM

The following table shows the results:

So, on average a new Porsche 911 costs **117,278.60 Euro**, and every kilometer you drive will cost you around **49.75 cents** in loss of value. The complete table with all car types can be found on RPubs. Although, simple and easy to interpret parameters, a straight line model is not a realistic model as can be seen in the following figure:

A better model to fit would be a non linear depreciation model. For example, exponential depreciation or if you don’t want to specify a specific function, some kind of **smoothing spline. **The R code only needs to be modified slightly, to code below fits a natural cubic splines per car type.

It is a better model (in terms of R-squared), it follows the non linear depreciation that we can see in the data. However, we do not have a single deprecation value. How much value a certain car will lose when driving 1 kilometer now depends on the amount of kilometers driven. It is the derivative of the fitted spline curve. For example, the spline curves fitted for a Renault Clio are given in the figure below. A Clio with automatic transmission hardly looses any value after 100,000 KM.

I have created a small shiny app so that you can see the curves of all the car types.

## Conclusion

Despite my data science exercise and beautiful natural cubic smoothing splines models, buying a brand new car involves a lot of **emotion**. My wife wants a blue Citroen C4 Picasso, no matter what cubic spline model and R-squared I show to her!

So just ignore my analysis and **buy the car that feels good to you!!** Cheers, Longhow.

**leave a comment**for the author, please follow the link and comment on their blog:

**R – Longhow Lam's Blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...