**Wiekvoet**, and kindly contributed to R-bloggers)

Having looked at the football data earlier, I wanted to look at predictions for new games. This consists of two parts, getting a predictive model, predicting and displaying the predictions. I decided to do this backwards, first to make the displays. This will make things easier when the time is there to compare models. To get the predictions I use a very simple model, which basically states, a club makes about x goals, irrespective of all other conditions. I don’t believe this model, but it can give predictions.

model1 <- glm(Goals ~OffenseClub,data=StartData,family=’poisson’)

The consequence of this setup is that each game needs two predictions, one for the first club, one for the second. For clubs Vitesse and FC Groningen are used to make the predictions.

The prediction of the glm is a mean number of goals, which is still quite far from the reality of a number of goals. For this I use the Poisson distribution and treat the prediction as true. I do not include overdispersion nor standard error of parameters. The result shows FC Groningen has 30% of getting no goals, 35% chance of getting 1 goal, 22 % for two goals, after which the chances become quickly very low.

top <- data.frame(OffenseClub=c(‘FC Groningen’,’Vitesse’))

prepred <- predict(model1,top,type=’response’)

(dp1 <- dpois(0:5,prepred[1]))[1:4]

[1] 0.29942768 0.36107456 0.21770672 0.08750956

Finally, the predictions need to be combined, to get a pair of goals. Not surprisingly, if the chance of a particular outcome, such as one goal, is 30%, then the chance of a pair of outcomes, such as 1-1 may be 30%*30%=9%. In this case it turns out to be slightly higher, 12%. This is the most probable outcome too.

dp2 <- dpois(0:6,prepred[2])

oo <- outer(dp1,dp2)

rownames(oo) <- 0:6

colnames(oo) <- 0:6

round(oo,digits=3)

0 1 2 3 4 5 6

0 0.073 0.103 0.073 0.034 0.012 0.003 0.001

1 0.088 0.124 0.088 0.041 0.015 0.004 0.001

2 0.053 0.075 0.053 0.025 0.009 0.002 0.001

3 0.021 0.030 0.021 0.010 0.004 0.001 0.000

4 0.006 0.009 0.006 0.003 0.001 0.000 0.000

5 0.002 0.002 0.002 0.001 0.000 0.000 0.000

6 0.000 0.000 0.000 0.000 0.000 0.000 0.000

c(sum(oo[upper.tri(oo)]),sum(diag(oo)),sum(oo[lower.tri(oo)]))

[1] 0.4167411 0.2612236 0.3211237

It is practical to fit all this in a little function which creates these data in one go. The only new things are the introduction of a new class fboo which is used to direct the prediction to the appropriate accompanying print function and some attributes to administrate the clubs predicted.

fbpredict <- function(object,club1,club2) {

top <- data.frame(OffenseClub=c(club1,club2),DefenseClub=c(club2,club1),OffThuis=c(1,0))

prepred <- predict(object,top,type=’response’)

dp1 <- dpois(0:9,prepred[1])

dp2 <- dpois(0:9,prepred[2])

oo <- outer(dp2,dp1)

rownames(oo) <- 0:9

colnames(oo) <- 0:9

class(oo) <- c(‘fboo’,class(oo))

attr(oo,’row’) <- club1

attr(oo,’col’) <- club2

wel <- c(sum(oo[upper.tri(oo)]),sum(diag(oo)),sum(oo[lower.tri(oo)]))

names(wel) <- c(club1,’equal’,club2)

return(list(details=oo,’summary chances’=wel))

}

print.fboo <- function(x,…) {

cat(attr(x,’row’),’in rows against’,attr(x,’col’),’in columns \n’)

class(x) <- class(x)[-1]

attr(x,’row’) <- NULL

attr(x,’col’) <- NULL

oo <- formatC(x,format=’f’,width=4) # fixed format

oo <- gsub(‘\\.0+$’,’ ‘,oo) # replace trailing 0 by ‘ ‘

oo <- substr(oo,1,6) # and fix the width

print(oo,quote=FALSE,justify=’left’)

}

fbpredict(model1,’FC Groningen’,’Vitesse’)

FC Groningen in rows against Vitesse in columns

0 1 2 3 4 5 6 7 8 9

0 0.0730 0.0880 0.0531 0.0213 0.0064 0.0016 0.0003 0.0001 0 0

1 0.1030 0.1242 0.0749 0.0301 0.0091 0.0022 0.0004 0.0001 0 0

2 0.0727 0.0877 0.0529 0.0213 0.0064 0.0015 0.0003 0.0001 0 0

3 0.0342 0.0413 0.0249 0.0100 0.0030 0.0007 0.0001 0 0 0

4 0.0121 0.0146 0.0088 0.0035 0.0011 0.0003 0.0001 0 0 0

5 0.0034 0.0041 0.0025 0.0010 0.0003 0.0001 0 0 0 0

6 0.0008 0.0010 0.0006 0.0002 0.0001 0 0 0 0 0

7 0.0002 0.0002 0.0001 0 0 0 0 0 0 0

8 0 0 0 0 0 0 0 0 0 0

9 0 0 0 0 0 0 0 0 0 0

$`summary chances`

FC Groningen equal Vitesse

0.3213815 0.2612237 0.4173918

**leave a comment**for the author, please follow the link and comment on his blog:

**Wiekvoet**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...