Google AI Challenge: Scores/Rank by Language

December 8, 2010

(This article was first published on R-Chart, and kindly contributed to R-bloggers)

A quick follow up to the previous post: about the the scores in the 2010 Google AI competition relative to programming language.  The chart above makes each language visible and discrete – and the scales are the same.
df<- read.csv(‘googleAI2010.csv’,sep=';’,header=FALSE)
df$V7 <- NULL
names(df)<- c(‘rank’, ‘username’,’country’,’organization’,’language’,’elo_score’)

ggplot(data=df, aes(x=rank, y=elo_score, color=language)) + 
+ geom_point(size=1) + 
+ facet_wrap(~ language) + opts(title=’Google AI 2010: Score by Rank for each Language’)
It is based upon a simple comparison of rank and score.

df<- read.csv(‘googleAI2010.csv’,sep=';’,header=FALSE)
df$V7 <- NULL
names(df)<- c(‘rank’, ‘username’,’country’,’organization’,’language’,’elo_score’)

ggplot(data=df, aes(x=rank, y=elo_score)) + geom_point(size=1) + opts(title=’Google AI Score by Rank’)

Another approach to viewing this information is a histogram by score (which ignores rank).  With a binwidth of 100 (and ignoring low scores of people who signed up but who dropped out relatively early) a (nearly) bimodal distribution appears.

qplot(data=df, x=elo_score, geom=’histogram’, binwidth=100)

Any ideas about why this is not normal?  Is there some aspect of ELO scoring that leads to this shape?  Or are there different types of programmers represented?

This can be broken down by language.  To avoid difficulty distinguishing colors, the rainbow palette is used and a few languages are not reported (since they were not highly represented in the competition).


df2=sqldf(“select * from df where language not in (‘Groovy’,’Scala’,’Go’,’OCaml’)”)
qplot(data=df2, x=elo_score, fill=language, geom=’histogram’, binwidth=100) + scale_fill_manual(values=rainbow(12)) 

As mentioned in the previous post, the data is available at GitHub – feel free to post some of your own visualizations of this data.

To leave a comment for the author, please follow the link and comment on their blog: R-Chart. offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...

If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.


Mango solutions

RStudio homepage

Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training


CRC R books series

Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)