Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Another preparatory step before I start learning about stats in the context of Formula One… There are a couple of things I’m hoping to achieve when I actually start the journey: 1) finding ways of using stats to help to pull out patterns and events that are interesting from a storytelling or news perspective; 2) seeing if I can come up with any models that help forecast or predict race winners or performances over a race weekend.

There are a couple of problems I can foresee (?!) when it comes to the predictions: firstly, unlike horseracing, there aren’t that many F1 races each year to test the predictions against. Secondly, how do I even get a baseline start on the probabilities that driver X or team Y might end up on the podium?

It seems to me as if betting odds provide one publicly available “best guess” at the likelihood of any driver winning a race (a range of other bets are possible, of course, that give best guess predictions for other situations…) Having had a sheltered life, the world of betting is completely alien to me, so here’s what I think I’ve learned so far…

Odds are related to the anticipated likelihood of a particular event occurring and represent the winnings you get back (plus your stake) if a particular event happens. So 2/1 (2 to 1) fractional odds say: if the event happens, you’ll get 2 back for every 1 you placed, plus your stake back. If I bet 1 unit at 2/1 and win, I get 3 back: my original 1 plus 2 more. If I bet 3, I get 9 back: my original 3 plus 2 for every 1 I placed. Since I placed 3 1s, I get back 3 x 2 = 6 in winnings. Plus my original 3, which gives me 9 back on 3 staked, a profit of 6.

Odds are related (loosely) to the likelihood of an event happening. 2/1 odds represent a likelihood (probability) that an event will happen (1/3) = 0.333… of the time (to cur down on confusion between fractional odds and fractional probabilities, I’ll try to remember to put the fractional probabilities in brackets; so 1/2 is fractional odds of 2 to 1 on, and (1/2) is a probability of one half). To see how this works, consider an evens bet, fractional odds of 1/1, such as someone might make for tossing a coin. The probability of getting heads on a single toss is (1/2); the probability of getting tails is also (1/2). If I’m giving an absolutely fair book based on these likelihoods, I’d offer you even odds that you get a head, for example, on a single toss. After all, it’s (fifty/fifty) (fifty per cent chance either way) of whether a heads or tails will land face up. If there are three equally possible outcomes, (1/3) each, then I’d offer 2/1. After all, it’s twice as likely that something other than the single outcome you called would come up. If there are four possible outcomes, I’d offer 3/1, because it’s likely (if we played repeatedly) that three times out of four, you’d be wrong. So every three times out of four you’d lose and I’d take your stake. And on the fourth go, when you get it right, I give you your stake back for that round plus three for winning, so over all we’d be back where we started.

Decimal odds are a way of describing the return you get on a unit stake. So for a 2/1 bet, the decimal odds are three. For a 4/1 bet they’d be 5. For an N/1 bet they’d be 1+N. For an 1/2 (two to one on?) bet they’d be 1.5, for a 1/10 bet they’d be 1.1. So for a 1/M bet, 1+1/M. Generally, for an N/M bet, decimal odds are 1+N/M.

Decimal odds give an easy way in to calculating the likelihood of an event. Decimal odds of 3, (that is, fractional odds 2/1), describe an event that will happen (1/3) of the time in a fair game. That is (1/(decimal odds)) of the time. For fractional odds of N/M, you expect the event to happen with probability (1/(1+N/M))

In a completely fair book (?my phrase), the sum of the odds should lead to the summed probability of all possible events happening of 1. Bookmakers right the odds in their favour though, so the summed probabilities on a book will add up to more than 1 – this represents the bookmaker’s margin. If you’re betting on the toss of a coin with a bookie, they may offer you 99/100 for heads, evens for tails. If you play 400 games and bet 300 heads and 200 tails, winning 100 of each, you’ll overall stake 400, win 100 (plus 100 back) on tails along with 99 (plus 100 original stake) on heads. That is, you’ll have staked 400 and got back 399. The bookie will be 1 up overall. The summed probabilities add up to more than 1, since (1/2) + (1/(1+99/100)) = (0.5 + ~0.5025) > 1.

One off bets are no basis for a strategy. You need to bet regularly. One way of winning is to follow a value betting strategy where you place bets on outcomes that you predict are more likely than the odds you’re offered. This is counter to how the bookie works. If a bookie offers you fractional odds of 3/1 (expectation that the event will happen (1/4) of the time), and you have evidence that suggests it will happen (1/3) of the time (decimal odds of 3, fractional odds 2/1) then it’s worth your while repeatedly accepting the bet. After all, if you play 12 rounds, you’ll wager 12, and win on 12/3=4 occasions, getting 4 back (3 + your stake) each time, to give you a net return of 4 x 4 – 12 = 16 – 12 = +4. If the event had happened at the bookie’s predicted likelihood of 1/4 of the time, you would have got back ( 12/4 ) * 4 – 12 = +0 overall.

I’ve tried to do an R script to explore this:

#My vocabulary may be a bit confused herein
#Corrections welcome in the comments from both statisticians and gamblers;-)

#The offered odds
price=4 #3/1 -> 3+1 That is, the decimal odds on fractional odds of 3/1
odds=1/price

#The odds I've predicted
myodds=1/3 #2/1 -> 1/(2+1)

#The number of repeated trials in the game
trials=10000

#The amount staked
bet=1

#The experiment that we'll run trials number of times
expt=function(trials,odds,myodds,bet){
#trial sets a uniform random number in ranger 0..1
df=data.frame(trial=runif(1:trials))
#The win condition happens at my predicted odds, ie if trial value is less than my odds
#So if my odds are (1/4) = 0.25, a trial value in range 0..0.25 counts as a win
# (df$trial<myodds) is TRUE if trial < myodds, which is cast by as.integer() to value 1 # If (df$trial<myodds) is FALSE, as.integer() returns 0
df$win=as.integer(df$trial<myodds)
df$bet=bet #The winnings are calculated at the offered odds and are net of the stake #The df$win/odds = 1/odds = price (the decimal odds) on a win, else 0
#The actual win is the product of the stake (bet) and the decimal odds
#The winnings are the return net of the initial amount staked
#Where there is no win, the winnings are a loss of the value of the bet
df$winnings=df$bet*df$win/odds-df$bet
df
}

df.e=expt(trials,odds,myodds,bet)

#The overall net winnings
sum(df.e$winnings) #If myodds > odds, then I'm likely to end up winning on a value betting strategy #A way of running the experiment several times #There are probably better R protocols for doing this? runs=10 df.r=data.frame(s=numeric(),v=numeric()) for (i in 1:runs){ e=expt(trials,odds,myodds,bet) df.r=rbind(df.r,data.frame(s=sum(e$winnings),v=sd(e\$winnings)))
}

#It would be nice to do some statistical graphics demonstrations of the different distributions of possible outcomes for different regimes. For example:
## different combinations of odds and myodds
## different numbers of trials
## different bet sizes

There are apparently also “efficient” ways of working out what stake to place (the “staking strategy”). The value strategy gives you the edge to win, long term, the staking strategy is how you maximise profits. See for example Horse Racing Staking and Betting: Horse racing basics part 2 or more mathematical treatments, such as The Kelly Criterion.

There is possibly some mileage to be had in getting to grips with R modeling using staking strategy models as an opening exercise, along with statistical graphical demonstrations of the same, but that is perhaps a little off topic for now…

To recap then, what I think I’ve learned is that we can test predictions against the benchmark of offered odds. The offered odds in themselves give us a ballpark estimate of what the (expert) bookmakers, as influenced by the betting/prediction market, expect the outcome of an event to be. Note that the odds are rigged to give summed probabilities over a range of events happening to be greater than 1, to build in a profit margin (does it have a proper name?) for the bookmaker. If we have a prediction model that appears to offer better odds on an event than the odds that are actually offered, and we believe in our prediction, we can run a value betting strategy on that basis and hopefully come out, over the long term, with a profit. The size of the profit is in part an indicator of how much more accurate our model is as a predictive model than the expert knowledge and prediction market basis that is used to set the bookie’s odds.  