**Gianluca Baio's blog**, and kindly contributed to R-bloggers)

Getting closer to my personal Euro2012 derby: England v Italy.

I find amusing that both sets of media think that their respective team have been gifted a good tie. The English are very happy to have avoided Spain, while the Italians don’t mind not playing the French. I guess these both make sense (particularly for the Italians, it is always very tense when we play France and I suppose we do mind the thought of getting kicked out by them).**But**: may be there’s something quite not adding up when both sides think they are favorite and that it is their turn to shine and go through to the semis. I really think it’s a very close game and my subjective prior for the game is genuinely vague. Here’s how I would proceed to formalise it.

First I would look for “hard” evidence to inform my thought process: Italy have played England 23 times; we have won more games (9 to 7) but overall have a worse goal difference (26 for and 28 against). In the last 15 years, we’ve played each other only 5 times. In the two official games Italy won one (at Wembley) and drew one. Italy also won two of the friendly games, while England won the remaining one. The last of those occasions was in 2002 and Buffon is the only player to still be around (as an active footballer, that is). So, I think all in all these stats are not very helpful to inform a prior distribution.

Then I would look for info on more recent games, even if not head-to-head. The graph below shows the recent form of the two teams (in every game they played in 2011/2012, including the first games in the Euro2012).

Looks like England are doing a bit better of late. However, the last three (competing) games were against:

- a very good opponent (Spain and France for Italy and England, respectively);
- a good and a so-so opponent (Croatia and Sweden); and
- a so-so and a good opponent (Ireland and Ukraine).

So, the difference in form seems to be in the fact that *all other things equal *(well, not really, but you know what I mean…) England managed to get a scruffy win aganist Ukraine, while Italy failed to hold on long enough and conceded an equaliser to Croatia. On the other hand, England have not succeeded in winning 3 games in a row in the last year. Again, probably not too much to go about to distinguish among the teams.

So, one way to form a prior is the following. Assume that I’m willing to consider a convenient parametric distribution for $\theta$, the probability that Italy win the game. For example, I can consider $\theta \sim \mbox{Beta}(\alpha,\beta)$. [As usual, this is **just one of the possible forms for the prior**; there’s nothing special about it, if not its mathematical properties!]

Now, consider these three quantities:

- the (assumed, by me)
*mode*of the distribution. Given all the uncertainty, which I was not able to resolve by looking at existing data, I’ll assume this to be 0.5, meaning that I am really very uncertain about who’s going to win and think that the best bet is 50:50. - The (assumed, by me)
*upper level of probability*that I can consider as reasonable to represent the chance that Italy win the game. Of course, I don’t think that there is absolute certainty that Italy will go through, so this level will be less than 1. I think I would go as far as to $u=$.8. - The (assumed, by me) cumulative probability that $\theta
__$u=$.8 and that mode$=$0.5, this cumulative probability should be relatively large. I feel confident that this would be a reasonable upper limit, and thus I consider $p=\Pr(\theta__

Estimating the predictive distribution of the result is the actual objective of the exercise. In fact, I’m not really interested in $\theta$. Given this (prior) information, a large number of simulations produces a median value of 1, which means that I’m predicting Italy to win $-$ but with a **huge** uncertainty attached.

**leave a comment**for the author, please follow the link and comment on their blog:

**Gianluca Baio's blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...