**Memo's Island**, and kindly contributed to R-bloggers)

Draws for the knockout phase of the 2012–13 UEFA Champions League will be held in Nyon on the 20th December 2012. The rules of the draw are simple and are as follows:

- 8 Group winner teams will be seeded.
- 8 Group runner-up teams will be unseeded.
- Teams coming from the same group and from same association can not be drawn against.

One can compute the probabilities of the likely outcomes by using ensemble average, a standard way of finding the frequencies where we identify whole sample space. Here we take a different approach and use time averages. What we do is to run large number of draws (*20 million times) *using randomly generated draws. I have called this exercise a Monte Carlo simulation, rather a simple sort while there is no fancy sampling of any sort, only rejection of pair up or the given draw if the last remaining pair does not satisfy the rules. A statistical language GNU R is used to generate the likely outcome. We use Mersenne Twister RNG, a default random number generator which is known to have very large cycle. In result we basically count the likely pair-ups in a matrix (matrix of pairs) and find the frequency of each likely pair up. Simulation takes 3-4 hours on a single CPU Intel Xeon with 16 GB RAM running Ubuntu GNU/Linux. Well, to be honest I haven’t optimized the code.

Qualified teams, their groups and associations definitions, the simulation and results reporting R code are as follows (note that we use xtable for html outputs, for other functions we use, see below codes after tables):

winnerTeams <- c('Paris-Saint-Germain', 'Schalke-04', 'Malaga', 'Borussia-Dortmund', 'Juventus', 'Bayern-Munich', 'Barcelona', 'Manchester-United');

winnerAssociation <- c('FR', 'DE', 'ES', 'DE', 'IT', 'DE', 'ES', 'ENG');

runnersUpTeams <- c('Porto', 'Arsenal', 'Milan', 'Real-Madrid', 'Shakhtar-Donetsk', 'Valencia', 'Celtic', 'Galatasaray');

runnersUpTeamsAssociation <- c('PT', 'ENG', 'IT', 'ES', 'UA', 'ES', 'SCO', 'TR');

countMatrix <- matrix(0, 8, 8)

row.names(countMatrix) <- winnerTeams;

colnames(countMatrix) <- runnersUpTeams;

many <- 20e6;

system.time(drawMany(winnerTeams, winnerAssociation, runnersUpTeams, runnersUpTeamsAssociation, countMatrix, many))

countMatrix <- countMatrix/many;

print(countMatrix);

Simulations results can be interpreted based on frequencies (probabilities) of pairings, rather intuitively while probabilities are not that far off . For example if we consider Barcelona, Milan and Arsenal score the largest 0.23 and 0.21. So my guess based on these frequencies, which I select maximums first ,then the second maximum and so on. If all ties I select the highest count using the second table. Here are the predicted pairs, (ordered with highest probability):

Barcelona – Milan

Malaga – Arsenal

Bayern Munich – Real Madrid

Borissia Dordmund – Valencia

Manchester United – Celtic

Juventus – Galatasaray

Schalke 04 – Porto

PSG – Donetsk

Note that predicted pairs are quite depending on your selection strategy.

Table for the frequencies:

Porto | Arsenal | Milan | Real-Madrid | Shakhtar-Donetsk | Valencia | Celtic | Galatasaray | |
---|---|---|---|---|---|---|---|---|

Paris-Saint-Germain | 0.00 | 0.13 | 0.14 | 0.18 |
0.12 | 0.18 |
0.12 | 0.12 |

Schalke-04 | 0.12 | 0.00 | 0.15 | 0.19 |
0.12 | 0.19 |
0.13 | 0.12 |

Malaga | 0.19 | 0.23 |
0.00 | 0.00 | 0.19 | 0.00 | 0.20 |
0.19 |

Borussia-Dortmund | 0.13 | 0.14 |
0.15 | 0.00 | 0.13 | 0.19 |
0.13 | 0.13 |

Juventus | 0.13 | 0.15 | 0.00 | 0.22 |
0.00 | 0.22 |
0.14 | 0.13 |

Bayern-Munich | 0.13 | 0.14 | 0.15 | 0.19 |
0.13 | 0.00 | 0.13 | 0.13 |

Barcelona | 0.18 | 0.21 |
0.23 |
0.00 | 0.19 | 0.00 | 0.00 | 0.18 |

Manchester-United | 0.13 | 0.00 | 0.16 | 0.22 |
0.13 | 0.22 |
0.14 | 0.00 |

Table for the counts (actually all are integers):

Porto | Arsenal | Milan | Real-Madrid | Shakhtar-Donetsk | Valencia | Celtic | Galatasaray | |
---|---|---|---|---|---|---|---|---|

Paris-Saint-Germain | 0.00 | 2589164.00 | 2897024.00 | 3658337.00 | 2356581.00 | 3658892.00 | 2494247.00 | 2345755.00 |

Schalke-04 | 2348458.00 | 0.00 | 2924099.00 | 3735314.00 | 2371245.00 | 3743246.00 | 2517610.00 | 2360028.00 |

Malaga | 3781019.00 | 4506559.00 | 0.00 | 0.00 | 3845872.00 | 0.00 | 3997679.00 | 3868871.00 |

Borussia-Dortmund | 2500221.00 | 2819591.00 | 3098797.00 | 0.00 | 2542013.00 | 3863888.00 | 2646626.00 | 2528864.00 |

Juventus | 2659773.00 | 2983255.00 | 0.00 | 4393444.00 | 0.00 | 4392158.00 | 2892103.00 | 2679267.00 |

Bayern-Munich | 2500835.00 | 2818331.00 | 3098181.00 | 3866890.00 | 2542742.00 | 0.00 | 2647340.00 | 2525681.00 |

Barcelona | 3620075.00 | 4283100.00 | 4684023.00 | 0.00 | 3721268.00 | 0.00 | 0.00 | 3691534.00 |

Manchester-United | 2589619.00 | 0.00 | 3297876.00 | 4346015.00 | 2620279.00 | 4341816.00 | 2804395.00 | 0.00 |

This is the main simulation function:

drawMany <- function(winnerTeams, winnerAssociation, runnersUpTeams, runnersUpTeamsAssociation, countMatrix, many) {

for(i in 1:many) {

repeat {

dr <- drawOneKnock(winnerTeams, winnerAssociation, runnersUpTeams, runnersUpTeamsAssociation,0);

if(sum(dr) > 0) break;

}

updateCount <- mapply(incMatrix, dr[,1], dr[,2])

}

}

A single draw can be generated as follows:

drawOneKnock <- function(winnerTeams, winnerAssociation, runnersUpTeams, runnersUpTeamsAssociation, names=1) {

k=1

repeat {

k=k+1;

if(k > 1000) return(-1);

blockWin = 1:8 ; # tracking for draw

blockRun = blockWin;

winners = c(); # Draw results

runners = c();

for(i in 1:7) {

kk =1;

repeat {

kk=kk+1;

if(kk > 1000) return(-1);

winner <- sample(blockWin, 1);

runner <- sample(blockRun, 1);

if(!(runner == winner) && !(winnerAssociation[winner] == runnersUpTeamsAssociation[runner])) {

break;

}

}

blockWin <- blockWin[-which(blockWin == winner)];

blockRun <- blockRun[-which(blockRun == runner)];

winners <- c(winners, winner);

runners <- c(runners, runner);

}

winner <- blockWin;

runner <- blockRun;

# check if last remaining is ok, otherwise invalidate draw

if(!(runner == winner) && !(winnerAssociation[winner] == runnersUpTeamsAssociation[runner])) {

winners <- c(winners, blockWin);

runners <- c(runners, blockRun);

if(names) dr <- cbind(winnerTeams[winners], runnersUpTeams[runners]);

if(!names) dr <- cbind(winners, runners);

break;

}

}

dr

}

And counting the pair-ups is performed by a simple function:

incMatrix <- function(i, j) {

countMatrix[i,j] <<- countMatrix[i,j]+1;

return(0);

}

**leave a comment**for the author, please follow the link and comment on their blog:

**Memo's Island**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...