Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The erstwhile big 4 all blanked their opponents last Saturday and a poster on the Guardian wondered when was the previous occasion of such an occurrence. It’s a pretty simple procedure in SQL using a subquery, but in the spirit of learning R, I thought I would tackle the problem in that language, with the benefit of a couple of graphs thrown in. Code with comments below

So on days when all four teams played, this is the ninth time it has happened and the first since 9th May 2010, when they bullied their way to sixteen goals without response. After a few occasions early on in the history of the EPL, there was a gap of more than 100 games and almost nine years when at least one of their opponents scored. The graph shows the increasing dominance of these teams in the past few years as evidenced by the increased days when 3 of the 4 post shut outs

On the other hand

On 22nd November 2008, there was a pretty extraordinary happening as none of these teams scored for the first and only time of the 232 days on which they have all played. Even then, three of them managed 0-0 ties; with Arsenal the sole loser

Chart type: Scatterplot
Inspiration: Comment in Guardian
Data: Own data
Tools: MSSQL database, R
Packages: RODBC, plyr, ggplot2
Ignorance fix: qplot colour stipulation, error in seq.int
Develop: Create function for vector of teams, differing goal levels
Make Interactive with web input of parameters (help required)

# load necessary libraries - typically these are all in my startup file
library(RODBC)
library(plyr)
library(ggplot2)

# Make a connection to MSSQL database and obtain data
# The subsetting of teams is done here but could equally well be performed in R
channel <- odbcConnect("myConnection")

results <- sqlQuery(channel,paste(
"
SELECT      myDate, TEAM,GA,GF
FROM         myTable
WHERE     myTable.TEAM IN (N'mnu',N'chl',N'ars',N'liv')
"
));

odbcClose(channel)

# summarise the data. I use .s as a suffix
# I am still coming to terms with the plyr package but Stackoverflow came to rescue
results.s <- ddply(results,.(myDate),summarise,games = length(TEAM),
blanks = length(which(GA == 0)),blanked = length(which(GF == 0)))

# check that it looks correct

myDate 	games blanks blanked
1 1992-08-15     3      0       0
2 1992-08-16     1      0       1

# restrict to dates on which all the teams were playing
results.s <- subset(results.s,games==4)

# plot graph
qplot(myDate,blanks, data=results.s,xlab="Game Order", ylab="Number of Shutouts"
,main="Shutouts recorded on same day by Arsenal, Chelsea, Liverpool and Man. U")
# !? Error in seq.int(r1$year, to$year, by) : 'from' must be finite

# Try using a numbered sequence instead.
results.s\$num <- seq(from=1, to=length(glsFor.p[,1]), by=1)

# This works !? why does specifying any color produce a red dot
qplot(num,blanks, data=results.s,xlab="Game Order", ylab="Number of Shutouts"
,main="Shutouts recorded on same day by Arsenal, Chelsea, Liverpool and Man. U"
,colour="steelblue")

# try ggplot version and reduce point size
ggplot(results.s, aes(x=num,y=blanks))+geom_point(size=1.5,colour="steelblue")+
opts(title="Shutouts recorded on same day by Arsenal, Chelsea, Liverpool and Man. U") +
labs(x="Game Order",y="Number of Shutouts")

# Plot the times teams do not score
ggplot(results.s, aes(x=num,y=blanked))+geom_point(size=1.5,colour="red")+
opts(title="Shutouts incurred on same day by Arsenal, Chelsea, Liverpool and Man. U") +
labs(x="Game Order",y="Number of Shutouts")

# print as pdfs and amend as required in Illustrator