Google AI Challenge: Scores/Rank by Language

December 8, 2010
By

(This article was first published on R-Chart, and kindly contributed to R-bloggers)

A quick follow up to the previous post: about the the scores in the 2010 Google AI competition relative to programming language.  The chart above makes each language visible and discrete - and the scales are the same.

library(ggplot2)
df<- read.csv('googleAI2010.csv',sep=';',header=FALSE)
df$V7 <- NULL
names(df)<- c('rank', 'username','country','organization','language','elo_score')


ggplot(data=df, aes(x=rank, y=elo_score, color=language)) + 
+ geom_point(size=1) + 
+ facet_wrap(~ language) + opts(title='Google AI 2010: Score by Rank for each Language')

It is based upon a simple comparison of rank and score.




df<- read.csv('googleAI2010.csv',sep=';',header=FALSE)
df$V7 <- NULL
names(df)<- c('rank', 'username','country','organization','language','elo_score')

ggplot(data=df, aes(x=rank, y=elo_score)) + geom_point(size=1) + opts(title='Google AI Score by Rank')


Another approach to viewing this information is a histogram by score (which ignores rank).  With a binwidth of 100 (and ignoring low scores of people who signed up but who dropped out relatively early) a (nearly) bimodal distribution appears.

qplot(data=df, x=elo_score, geom='histogram', binwidth=100)


Any ideas about why this is not normal?  Is there some aspect of ELO scoring that leads to this shape?  Or are there different types of programmers represented?

This can be broken down by language.  To avoid difficulty distinguishing colors, the rainbow palette is used and a few languages are not reported (since they were not highly represented in the competition).

library(sqldf)

df2=sqldf("select * from df where language not in ('Groovy','Scala','Go','OCaml')")
df2$language=factor(df2$language)
qplot(data=df2, x=elo_score, fill=language, geom='histogram', binwidth=100) + scale_fill_manual(values=rainbow(12)) 



As mentioned in the previous post, the data is available at GitHub - feel free to post some of your own visualizations of this data.

To leave a comment for the author, please follow the link and comment on his blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.