**PremierSoccerStats » R**, and kindly contributed to R-bloggers)

The erstwhile big 4 all blanked their opponents last Saturday and a poster on the Guardian wondered when was the previous occasion of such an occurrence. It’s a pretty simple procedure in SQL using a subquery, but in the spirit of learning R, I thought I would tackle the problem in that language, with the benefit of a couple of graphs thrown in. Code with comments below

So on days when all four teams played, this is the ninth time it has happened and the first since 9th May 2010, when they bullied their way to sixteen goals without response. After a few occasions early on in the history of the EPL, there was a gap of more than 100 games and almost nine years when at least one of their opponents scored. The graph shows the increasing dominance of these teams in the past few years as evidenced by the increased days when 3 of the 4 post shut outs

On 22nd November 2008, there was a pretty extraordinary happening as none of these teams scored for the first and only time of the 232 days on which they have all played. Even then, three of them managed 0-0 ties; with Arsenal the sole loser

**Chart type:** Scatterplot

**Inspiration:** Comment in Guardian

**Data:** Own data

**Tools:** MSSQL database, R

**Packages:** RODBC, plyr, ggplot2

**Ignorance fix:** qplot colour stipulation, error in seq.int

**Develop:** Create function for vector of teams, differing goal levels

Make Interactive with web input of parameters (help required)

# load necessary libraries - typically these are all in my startup file library(RODBC) library(plyr) library(ggplot2) # Make a connection to MSSQL database and obtain data # The subsetting of teams is done here but could equally well be performed in R channel <- odbcConnect("myConnection") results <- sqlQuery(channel,paste( " SELECT myDate, TEAM,GA,GF FROM myTable WHERE myTable.TEAM IN (N'mnu',N'chl',N'ars',N'liv') " )); odbcClose(channel) # summarise the data. I use .s as a suffix # I am still coming to terms with the plyr package but Stackoverflow came to rescue results.s <- ddply(results,.(myDate),summarise,games = length(TEAM), blanks = length(which(GA == 0)),blanked = length(which(GF == 0))) # check that it looks correct head(results.s,2) myDate games blanks blanked 1 1992-08-15 3 0 0 2 1992-08-16 1 0 1 # restrict to dates on which all the teams were playing results.s <- subset(results.s,games==4) # plot graph qplot(myDate,blanks, data=results.s,xlab="Game Order", ylab="Number of Shutouts" ,main="Shutouts recorded on same day by Arsenal, Chelsea, Liverpool and Man. U") # !? Error in seq.int(r1$year, to$year, by) : 'from' must be finite # Try using a numbered sequence instead. results.s$num <- seq(from=1, to=length(glsFor.p[,1]), by=1) # This works !? why does specifying any color produce a red dot qplot(num,blanks, data=results.s,xlab="Game Order", ylab="Number of Shutouts" ,main="Shutouts recorded on same day by Arsenal, Chelsea, Liverpool and Man. U" ,colour="steelblue") # try ggplot version and reduce point size ggplot(results.s, aes(x=num,y=blanks))+geom_point(size=1.5,colour="steelblue")+ opts(title="Shutouts recorded on same day by Arsenal, Chelsea, Liverpool and Man. U") + labs(x="Game Order",y="Number of Shutouts") # Plot the times teams do not score ggplot(results.s, aes(x=num,y=blanked))+geom_point(size=1.5,colour="red")+ opts(title="Shutouts incurred on same day by Arsenal, Chelsea, Liverpool and Man. U") + labs(x="Game Order",y="Number of Shutouts") # print as pdfs and amend as required in Illustrator

**leave a comment**for the author, please follow the link and comment on their blog:

**PremierSoccerStats » R**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...