**DiffusePrioR » R**, and kindly contributed to R-bloggers)

It looks increasingly likely that Gareth Bale will transfer from Tottenham to Real Madrid for a world record transfer fee. Negotiations are ongoing, with both parties keen to get the best deal possible deal with the transfer fee. Reports speculate that this transfer fee will be anywhere in the very wide range of £80m to £120m.

Given the topical nature of this transfer saga, I decided to explore the world record breaking transfer fee data, and see if these data can help predict what the Gareth Bale transfer fee should be. According to this Wikipedia article, there have been 41 record breaking transfers, from Willie Groves going from West Brom to Aston Villa in 1893 for £100, to Cristiano Ronaldo’s £80m 2009 transfer to Real Madrid from Manchester United.

When comparing any historical price data it is very important that we are comparing like with like. Clearly, a fee of £100 in 1893 is not the same as £100 in 2009. Therefore, the world record transfer fees need to be adjusted for inflation. To do this, I used the excellent measuringworth website, and converted all of the transfer fees into 2011 pounds sterling.

The plot above demonstrates a very strong linear relationship between logged real world record transfer fees and time. The R-squared indicates that the year of the transfer fee explains roughly 97% of the variation in price.

So, if Real Madrid are to pay a world transfer fee for Bale, how much does this model predict the fee will be? The above plot demonstrates what happens when the simple log-linear model is extrapolated to predict the world record transfer fee in 2013. The outcome here is 18.37, so around £96m, in 2011 prices. We can update this value to 2013 prices. Assuming a modest inflation rate of 2% we get £96m[exp(0.02*2)]=£99.4m. No small potatoes.

rm(list=ls()) bale = read.csv("bale.csv") # data from: # http://en.wikipedia.org/wiki/World_football_transfer_record # http://www.measuringworth.com/ukcompare/ ols1 = lm(log(real2011)~year, bale) # price exp(predict(ols1,data.frame(year=2013))) # inflate lets say 2% inflation exp(predict(ols1,data.frame(year=2013)))*exp(0.02*2) # nice ggplot library(ggplot2) bale$lnprice2011 = log(bale$real2011) addon = data.frame(year=2013,nominal=0,real2011=0,name="Bale?", lnprice2011=predict(ols1,data.frame(year=2013))) ggplot(bale, aes(x=year, y=lnprice2011, label=name)) + geom_text(hjust=0.4, vjust=0.4) + stat_smooth(method = "lm",fullrange = TRUE, level = 0.975) + theme_bw(base_size = 12, base_family = "") + xlim(1885, 2020) + ylim(8, 20) + xlab("Year") + ylab("ln(Price)") + ggtitle("World Transfer Records, Real 2011 Prices (£)")+ geom_point(aes(col="red"),size=4,data=addon) + geom_text(aes(col="red", fontface=3),hjust=-0.1, vjust=0,size=7,data=addon) + theme(legend.position="none")

**leave a comment**for the author, please follow the link and comment on their blog:

**DiffusePrioR » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...