World Cup 2010 Statistics Plotted with R

July 11, 2010

(This article was first published on R-Chart, and kindly contributed to R-bloggers)

Opta  agreed to let the UK Guardian Data Blog publish 2010 World Cup Team and Player statistics.  The data is available in a Google Docs spreadsheet.  There are two tabs on this spreadsheet – one is PLAYERS the other is TEAM statistics.  I chose File -> Download As -> CSV and downloaded the files through a web browser, then moved them to my working R directory.  I named the first World Cup 2010 data.csv (Player Data) and the second World Cup 2010 TEAM data.csv (Team Data).

By the way, if anyone knows how individual Google Docs spreadsheets can be downloaded as CSVs via URLs, please let me know by commenting on this post.  I could not figure out how to do this straight from R by reading a URL (which is my preference).

The following are a few charts that can be created with the data.  You might also want to do more sophisticated predictive analysis, by I will leave that to Paul.  

The sheet with player data can be read in as a CSV

DF=read.csv(‘World Cup 2010 data.csv’)

The following attributes are available for each player.



The base graphics package can be used to produce the following chart of the USA team’s shots attempted by player.

# Create a smaller data frame that 
# contains only USA player names 
# and shots attempted.

# Make the player Names the rownames

# Flip the X axis labels and provide enough room in the margins to print the names
par(las=2,mar=c(8, 4, 1, 2) + 0.1)

# Pivot the table, print the barplot and add a title
title(‘2010 World Cup USA Total Shots Attempted’)

Now an example with the Team data.  In this case, the column names are actually the names of the countries.

DF2=read.csv(‘World Cup 2010 TEAM data.csv’)

The attributes about each team are available in the first column.


Games Played
Ave Goals per game
Shots (excl blocked shots)
% Shots on Target
% Goals to Shots
Overall Pass Completion %
Cross Completion %
Goals Conceded
Ave goals conceded per game
Tackles Won %
Yellow Cards
Red Cards

I prefer these attributes as row names – so moved them there using the following:


This time, we will use qqplot and create a horizontal barchart that includes a gradient that increases to highlight the countries with the most fouls.  I think you will agree – qqplot produces much better results.  The author of the (Hadley Wickham) just released a new version of this package.  He also has written a book on it  which goes into greater depth about its use and design (based upon Leland Wilkison’s Grammar of Graphics).  The example that follows uses the simpler qplot call, the team names as the x axis, and the number of fouls as the y axis.  The “Geometry” specified indicates that we are using a bar chart, and we specify coord_flip to switch the x and y axis.

qplot(names(FOULS), as.numeric(FOULS), geom=”bar”, stat=’identity’, fill=Fouls) + xlab(‘Country’) + ylab(‘Fouls’) + coord_flip() + scale_fill_continuous(low=”black”, high=”red”) + labs(fill=’Fouls’)

When t was used to pivot the data.frame, it changed it to a matrix and the type of the numeric values became character. The as.numeric function was used to cast it back.

The following is the same type of plot for Goals.  This chart also includes a title.  It appears at the top of this post.

qplot(names(GOALS), as.numeric(GOALS), geom=”bar”, stat=’identity’, fill=as.numeric(GOALS)) + xlab(‘Country’) + ylab(‘Goals’) + coord_flip() + scale_fill_continuous(low=”yellow”, high=”blue”) + labs(fill=’Goals’) + opts(title = “2010 World Cup Goals (as of 07/10/2010)”)

Hope you enjoyed this little excursion.

To leave a comment for the author, please follow the link and comment on their blog: R-Chart. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)