Google AI Challenge: Languages Used by the Best Programmers

December 2, 2010
By

(This article was first published on R-Chart, and kindly contributed to R-bloggers)





The Google AI Challenge recently wrapped up with a Lisp developer from Hungary as the winner.  The competition challenges contestants to create bots that push the limits of AI and game theory.  These bots compete against one another, and a complete ranking of competitors is available.  The big story today is that the winner (Gábor Melis) used Lisp to beat out over 4000 other contestants around the world using a host of different programming languages.   




Paul Graham has stated that Java was designed for "average" programmers while other languages (like Lisp) are for good programmers.  The fact that the winner of the competition wrote in Lisp seems to support this assertion.  Or should we see Mr. Melis as an anomaly who happened to use Lisp for this task?



Programming Languages Usage


Java, C++, Python and C# were heavily used overall.

     language count(*)

1        Java     1634
2         C++     1232
3      Python      948
4          C#      485
5         PHP       80
6        Ruby       55
7     Haskell       51
8        Perl       42
9        Lisp       33
10 Javascript       19
11          C       18
12      OCaml       12
13         Go        6
14      Scala        4
15     Groovy        1

In the Top 200
     language count(*)
1        Java       70
2         C++       64
3      Python       34
4          C#       17
5           C        4
6     Haskell        3
7         PHP        3
8        Ruby        2
9  Javascript        1
10       Lisp        1
11      OCaml        1


Top 100

1     Java       33
2      C++       32
3   Python       20
4       C#        9
5        C        3
6  Haskell        1
7     Lisp        1
8    OCaml        1

Top 10
  language count(*)
1     Java        4
2      C++        3
3       C#        2
4     Lisp        1


The plot above is a bit difficult to discern due to the number of languages represented (and similarity in colors).  So here is a breakdown by language.

Lisp does appear to be skewed towards higher ranking.  But even more striking are the C hippies:

The functional crowd represented with Haskell also ranked on the higher end:


How about Java?  There is a trend towards the average - but a significantly larger number of entrants used Java.  It also is a language taught in many colleges, and might reflect greater student participation in these languages (although MIT did focus on Lisp back in the day...).
How about representatives from the Microsoft?  Einstein and Elvis showed up - Mort was not interested.

I can post charts of other languages if anyone asks - otherwise, download the files for yourself and draw your own conclusions.  And congratulations to 

Gábor Melis - I am again feeling the inspiration to delve into the mysteries of Lisp and meander among mountains of parenthesis...




Methodology Used
No need to proceed further unless you are interested in how the results listed above were derived.

Basically, I used Ruby to scrape the results from the Google AI Rankings site.  The results were read into Ruby, and ggplot2 and sqldf libraries were used to analyze the results.

Get the Data into R
So to find out more...I whipped up a ruby script to create a delimited file from the 47 page listing online.  (Feel free to get these from their GitHub location and do some additional validation/analysis of your own).   Read this file into R:


df<- read.csv('googleAI2010.csv',sep=';',header=FALSE)
df$V7 <- NULL
names(df)<- c('rank', 'username','country','organization','language','elo_score')


Sanity Check
Most of this work can be done in idiomatic R (which has some significant Lisp influences) - which might be a better way to honor the winner.  However, I find myself using sqlite more and more these days - particularly in mobile development.  So I used the sqldf library which uses this database behind the scenes.

Country rankings are available online, and the following emulates these results.  Specifically, the number of entrants in the top 200 ranked contestants from each country can be derived as follows:




library('sqldf')


top200=df[df$rank <= 200,]


sqldf('select country, count(*) from top200 group by country order by 2 desc')


Organization rankings are similar, representing the top organizations within the top 100.  There are some anomalies here, the highest ranking "Other" is not shown in the online version for obvious reasons, and only the most of these have only one entrant in the top 100 an are listed in an arbitrary manner.  However, the results are otherwise the same in R.




top100=df[df$rank <= 100,]
sqldf('select organization, count(*) from top100 group by organization order by 2 desc')




R Code
The following are additional snippets of R code used to generate the results above.


# Language Usage

sqldf('select language, count(*) from df group by language order by 2 desc')


sqldf('select language, count(*) from top200 group by language order by 2 desc')
sqldf('select language, count(*) from top100 group by language order by 2 desc')



top10=df[df$rank <= 10,]
sqldf('select language, count(*) from top10 group by language order by 2 desc')



 If you fiddle enough with the bucket size for histograms, you might be able to draw some conclusions... but the density plot seemed like a nicer option.  


library('ggplot2')

# Substitute your favorite language of those available for Lisp below
qplot(data=df[df$language=='Lisp',], x=rank, geom='histogram', binwidth=1000) + opts(title='Lisp') 





# The density plot at the top of this posting:

ggplot(data=df, aes(rank, fill=language)) + 
  geom_density(alpha = 0.2) + 

 xlim(0,5000) +

  opts(title='2010 Google AI Challenge Rankings')


ggsave('program_language_density_plot.png')


# Breakdown by language:

ggplot(data=df[df$language=='Scala',], aes(rank, fill=language)) + geom_density(alpha = 0.2) + xlim(0,5000) + opts(title='Scala') 


Update:  I have been keeping up with the comments - and sketched out some other ways of looking at the data in another post.

To leave a comment for the author, please follow the link and comment on his blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.