# Ordinal football

**Gianluca Baio's blog**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve had a quick look at this article on R-bloggers $-$ I don’t think I’ve followed the whole exchange, but I believe they have discussed what models should/could be applied to estimate football scores (specifically, in this case they are using the Dutch league).

The main point of the post is that using ordinal regression models can improve the performance (I suppose in terms of prediction or validation of the probability associated with the observed frequency of the results).

At a very superficial level (since I’ve just read the article and have not thought about this a great deal), I think that assuming that the observed number of goals can be considered as an ordinal variable, much as you would do for a Likert scale, is not quite the best option.

This assumption *might* not have a huge impact on the actual results of this model; just as for an ordinal variable, the distance between the modalities is not linear (thus moving from scoring 0 to scoring 1 goal does not necessarily take the same effort required for moving from scoring 3 to scoring 4 goals). And ordinal regression can accommodate this situation. But I think this formulation is unnecessarily complicated and a bit confusing.

Moreover (and far more importantly, I think), if I understand it correctly, both the original models and those discussed in the post I’m considering seem to assume independence between the goals scored by the two teams competing in a single game. This is not realistic, I think, as we proved in our paper (of course drawing on other good examples in the literature).

In particular, we were considering a hierarchical structure in which the goals scored by the two competing teams are conditionally independent given a set of parameters (accounting for defence and attack, and home advantage); but because these were given exchangeable priors, correlation would be implied in the responses $-$ something like this:

**leave a comment**for the author, please follow the link and comment on their blog:

**Gianluca Baio's blog**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.