**Wiekvoet**, and kindly contributed to R-bloggers)

After reading data, making a predictions display and building a football data model it is time to put this to validate a bit more (regression plots) and put to usage. It appears that the regression plots in the car package were not very informative for this model, except to find unexpected results. The predictions are not good at all.

### Roda JC-FC Utrecht

First of all, when I started this post on Friday, Roda JC played against FC Utrecht, 0-1. FC Utrecht goes sub top. Frankly, I am not very surprised. While I would expect Roda JC to win this game (p=0.46), chance for FC Utrecht to win is 0.33, so that is not so strange but is disappointing for my model.

fbpredict(model3,”Roda JC”,”FC Utrecht”)

$details

Roda JC in rows against FC Utrecht in columns

0 1 2 3 4 5 6 7 8 9

0 0.0241 0.0489 0.0494 0.0334 0.0169 0.0068 0.0023 0.0007 0.0002 0

1 0.0410 0.0830 0.0841 0.0567 0.0287 0.0116 0.0039 0.0011 0.0003 0.0001

2 0.0349 0.0706 0.0714 0.0482 0.0244 0.0099 0.0033 0.0010 0.0002 0.0001

3 0.0198 0.0400 0.0405 0.0273 0.0138 0.0056 0.0019 0.0005 0.0001 0

4 0.0084 0.0170 0.0172 0.0116 0.0059 0.0024 0.0008 0.0002 0.0001 0

5 0.0029 0.0058 0.0058 0.0039 0.0020 0.0008 0.0003 0.0001 0 0

6 0.0008 0.0016 0.0017 0.0011 0.0006 0.0002 0.0001 0 0 0

7 0.0002 0.0004 0.0004 0.0003 0.0001 0.0001 0 0 0 0

8 0 0.0001 0.0001 0.0001 0 0 0 0 0 0

9 0 0 0 0 0 0 0 0 0 0

$`summary chances`

Roda JC equal FC Utrecht

0.4580659 0.2126926 0.3291782

### Diagnostic plots

But, I should have looked at some of the diagnostic plots first. Luckily between the stats and the car package we have a nice collection of tools.

library(car)

infIndexPlot(model3)

A number of things are interesting. Just under index 100 is the most influential observation. We can track this to row 97. This represents Groningen making six ! goals against Feyenoord. That’s the best FC Groningen ever did against Feyenoord. It was also found to be a remarkable result at that time, so if this were not influential or outlyingI should be worried.

outlierTest(model3)

StartData[97,]

OffenseClub DefenseClub Goals OffThuis

97 FC Groningen Feyenoord 6 1

mm <- model.matrix(model3)

table(diag(mm %*% solve(t(mm) %*% mm) %*% t(mm)))

0.0588235294117647 0.0588235294117648 0.0588235294117649

179 427 6

#### ResidualPlot

### Predicting

Last week flo2speak commented that (s)he tried to do the same in German football and had 30% correct. I am in the same ballpark. Maybe I should try the ordinal regression too. On top of that; I never predict ties, this weekend had four games tied, and season 2011-2012 had 20% of the games tied. Clearly improvement is needed.

#### Prediction code

**leave a comment**for the author, please follow the link and comment on their blog:

**Wiekvoet**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...