df$V7 <- NULL
names(df)<- c(‘rank’, ‘username’,’country’,’organization’,’language’,’elo_score’)
ggplot(data=df, aes(x=rank, y=elo_score)) + geom_point(size=1) + opts(title=’Google AI Score by Rank’)
Another approach to viewing this information is a histogram by score (which ignores rank). With a binwidth of 100 (and ignoring low scores of people who signed up but who dropped out relatively early) a (nearly) bimodal distribution appears.
qplot(data=df, x=elo_score, geom=’histogram’, binwidth=100)
Any ideas about why this is not normal? Is there some aspect of ELO scoring that leads to this shape? Or are there different types of programmers represented?
This can be broken down by language. To avoid difficulty distinguishing colors, the rainbow palette is used and a few languages are not reported (since they were not highly represented in the competition).
df2=sqldf(“select * from df where language not in (‘Groovy’,’Scala’,’Go’,’OCaml’)”)
qplot(data=df2, x=elo_score, fill=language, geom=’histogram’, binwidth=100) + scale_fill_manual(values=rainbow(12))
As mentioned in the previous post, the data is available at GitHub – feel free to post some of your own visualizations of this data.