**Software for Exploratory Data Analysis and Statistical Modelling**, and kindly contributed to R-bloggers)

Following on from the previous post about creating a football result processing function for data from the football-data.co.uk website we will add code to the function to generate a league table based on the results to date.

To create the league table we need to count various things such as the number of games played, number of wins/draws/losses, goals scored etc. This information is available in the results object that is loaded from a **csv** file in the function as it stands.

To facilitate these calculations we create a data frame with a row for each team in the division and then calculate the statistics required – this was a reason for ordering the factors in the **HomeTeam** and **AwayTeam** columns of the results table. The data frame is created with the code below:

tmpTable = data.frame(Team = teams, Games = 0, Win = 0, Draw = 0, Loss = 0, HomeGames = 0, HomeWin = 0, HomeDraw = 0, HomeLoss = 0, AwayGames = 0, AwayWin = 0, AwayDraw = 0, AwayLoss = 0, Points = 0, HomeFor = 0, HomeAgainst = 0, AwayFor = 0, AwayAgainst = 0, For = 0, Against = 0, GoalDifference = 0)

There are a number of slots that are may be redundant in a league table but are used for intermediate calculations, such as **HomeWin** and **AwayWin** that are combined to find the total number of victories for a team.

The number of games played by each team home and away are counted using the table command for the two columns respectively.

tmpTable$HomeGames = as.numeric(table(tmpResults$HomeTeam)) tmpTable$AwayGames = as.numeric(table(tmpResults$AwayTeam))

The labels created by the table command are discarded using the as.numeric function to retain only the number of games. The table command is also used to count the number of wins, draws and losses at home and away for each team. The commands are shown here:

tmpTable$HomeWin = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == "H"])) tmpTable$HomeDraw = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == "D"])) tmpTable$HomeLoss = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == "A"])) tmpTable$AwayWin = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == "A"])) tmpTable$AwayDraw = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == "D"])) tmpTable$AwayLoss = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == "H"]))

Note that we subset on the values in the **FTR** column, which is full-time result, and then count. The subsetting is reversed when looking at the away fixtures because a victory for the team is now an away win rather than a home win.

This information is then combined to get total games played, won etc.

tmpTable$Games = tmpTable$HomeGames + tmpTable$AwayGames tmpTable$Win = tmpTable$HomeWin + tmpTable$AwayWin tmpTable$Draw = tmpTable$HomeDraw + tmpTable$AwayDraw tmpTable$Loss = tmpTable$HomeLoss + tmpTable$AwayLoss

The total points is calclated by multiplying the number of wins, draws and losses by the number of points awarded for each match outcome.

tmpTable$Points = winPoints * tmpTable$Win + drawPoints * tmpTable$Draw + lossPoints * tmpTable$Loss

The next set of calculations are to count the number of goals scored, goals conceeded and goal difference. The **tapply** function is used for these calculations.

tmpTable$HomeFor = as.numeric(tapply(tmpResults$FTHG, tmpResults$HomeTeam, sum, na.rm = TRUE)) tmpTable$HomeAgainst = as.numeric(tapply(tmpResults$FTAG, tmpResults$HomeTeam, sum, na.rm = TRUE)) tmpTable$AwayFor = as.numeric(tapply(tmpResults$FTAG, tmpResults$AwayTeam, sum, na.rm = TRUE)) tmpTable$AwayAgainst = as.numeric(tapply(tmpResults$FTHG, tmpResults$AwayTeam, sum, na.rm = TRUE))

The **tapply** function applies the **sum** to the number of goals scored at home or away, and the number of goals conceeded by each team in the division. These are then combined to create totals home and away:

tmpTable$For = ifelse(is.na(tmpTable$HomeFor), 0, tmpTable$HomeFor) + ifelse(is.na(tmpTable$AwayFor), 0, tmpTable$AwayFor) tmpTable$Against = ifelse(is.na(tmpTable$HomeAgainst), 0, tmpTable$HomeAgainst) + ifelse(is.na(tmpTable$AwayAgainst), 0, tmpTable$AwayAgainst)

The **ifelse** statement is used to handle situations where a team hasn’t played a home and/or away fixture yet. The goal difference is easy to calculate:

tmpTable$GoalDifference = tmpTable$For - tmpTable$Against

Now that all of the statistics have been calculated we sort the table based on the number of points, goal difference and finally alphabetically. There might be different ways that we can order the teams but this is what we will use for the time being:

tmpTable = tmpTable[order(- tmpTable$Points, - tmpTable$GoalDifference, tmpTable$Team),]

The ordering might look odd but we want to ranking from highest to lowest points and goal difference but then in ascending alphabetical order for the teams.

The whole function is now:

football.process.v2 = function(datafile, country, divname, season, teams, winPoints = 3, drawPoints = 1, lossPoints = 0) { ## Validation Function Arguments if (missing(datafile)) { stop("Results csv file not specified.") } if (missing(country)) { warning("Country of league not specified.") country = "" } if (missing(divname)) { warning("Name of league division not specified.") divname = "" } ## Import Results tmpResults = read.csv(datafile)[,c("Date","HomeTeam","AwayTeam","FTR","FTHG","FTAG")] if (missing(teams)) { warning("Team names not specified - extracted from results data.") teams = sort(unique(c(as.character(tmpResults$HomeTeam), as.character(tmpResults$AwayTeam)))) } tmpResults$HomeTeam = factor(tmpResults$HomeTeam, levels = teams) tmpResults$AwayTeam = factor(tmpResults$AwayTeam, levels = teams) ## Create Empty League Table tmpTable = data.frame(Team = teams, Games = 0, Win = 0, Draw = 0, Loss = 0, HomeGames = 0, HomeWin = 0, HomeDraw = 0, HomeLoss = 0, AwayGames = 0, AwayWin = 0, AwayDraw = 0, AwayLoss = 0, Points = 0, HomeFor = 0, HomeAgainst = 0, AwayFor = 0, AwayAgainst = 0, For = 0, Against = 0, GoalDifference = 0) ## Count Number of Games Played tmpTable$HomeGames = as.numeric(table(tmpResults$HomeTeam)) tmpTable$AwayGames = as.numeric(table(tmpResults$AwayTeam)) ## Count Number of Wins/Draws/Losses tmpTable$HomeWin = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == "H"])) tmpTable$HomeDraw = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == "D"])) tmpTable$HomeLoss = as.numeric(table(tmpResults$HomeTeam[tmpResults$FTR == "A"])) tmpTable$AwayWin = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == "A"])) tmpTable$AwayDraw = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == "D"])) tmpTable$AwayLoss = as.numeric(table(tmpResults$AwayTeam[tmpResults$FTR == "H"])) tmpTable$Games = tmpTable$HomeGames + tmpTable$AwayGames tmpTable$Win = tmpTable$HomeWin + tmpTable$AwayWin tmpTable$Draw = tmpTable$HomeDraw + tmpTable$AwayDraw tmpTable$Loss = tmpTable$HomeLoss + tmpTable$AwayLoss tmpTable$Points = winPoints * tmpTable$Win + drawPoints * tmpTable$Draw + lossPoints * tmpTable$Loss ## Count Goals Scored and Conceeded tmpTable$HomeFor = as.numeric(tapply(tmpResults$FTHG, tmpResults$HomeTeam, sum, na.rm = TRUE)) tmpTable$HomeAgainst = as.numeric(tapply(tmpResults$FTAG, tmpResults$HomeTeam, sum, na.rm = TRUE)) tmpTable$AwayFor = as.numeric(tapply(tmpResults$FTAG, tmpResults$AwayTeam, sum, na.rm = TRUE)) tmpTable$AwayAgainst = as.numeric(tapply(tmpResults$FTHG, tmpResults$AwayTeam, sum, na.rm = TRUE)) tmpTable$For = ifelse(is.na(tmpTable$HomeFor), 0, tmpTable$HomeFor) + ifelse(is.na(tmpTable$AwayFor), 0, tmpTable$AwayFor) tmpTable$Against = ifelse(is.na(tmpTable$HomeAgainst), 0, tmpTable$HomeAgainst) + ifelse(is.na(tmpTable$AwayAgainst), 0, tmpTable$AwayAgainst) tmpTable$GoalDifference = tmpTable$For - tmpTable$Against ## Sort Table ## By Points ## By Goal Difference ## By Team Name (Alphabetical) tmpTable = tmpTable[order(- tmpTable$Points, - tmpTable$GoalDifference, tmpTable$Team),] tmpTable = tmpTable[,c("Team", "Games", "Win", "Draw", "Loss", "Points", "For", "Against", "GoalDifference")] ## Return Division Information tmpSummary = list(Country = country, Division = divname, Season = season, Teams = teams, Results = tmpResults, Table = tmpTable) invisible(tmpSummary) }

There are other functionality that we might want to add to the function.

**leave a comment**for the author, please follow the link and comment on his blog:

**Software for Exploratory Data Analysis and Statistical Modelling**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...