The Google AI Challenge recently wrapped up with a Lisp developer from Hungary as the winner. The competition challenges contestants to create bots that push the limits of AI and game theory. These bots compete against one another, and a complete ranking of competitors is available. The big story today is that the winner (Gábor Melis) used Lisp to beat out over 4000 other contestants around the world using a host of different programming languages.
Paul Graham has stated that Java was designed for "average" programmers while other languages (like Lisp) are for good programmers. The fact that the winner of the competition wrote in Lisp seems to support this assertion. Or should we see Mr. Melis as an anomaly who happened to use Lisp for this task?
Programming Languages Usage
Java, C++, Python and C# were heavily used overall.
1 Java 1634
2 C++ 1232
3 Python 948
4 C# 485
5 PHP 80
6 Ruby 55
7 Haskell 51
8 Perl 42
9 Lisp 33
11 C 18
12 OCaml 12
13 Go 6
14 Scala 4
15 Groovy 1
In the Top 200
1 Java 70
2 C++ 64
3 Python 34
4 C# 17
5 C 4
6 Haskell 3
7 PHP 3
8 Ruby 2
10 Lisp 1
11 OCaml 1
1 Java 33
2 C++ 32
3 Python 20
4 C# 9
5 C 3
6 Haskell 1
7 Lisp 1
8 OCaml 1
1 Java 4
2 C++ 3
3 C# 2
4 Lisp 1
The plot above is a bit difficult to discern due to the number of languages represented (and similarity in colors). So here is a breakdown by language.
How about Java? There is a trend towards the average - but a significantly larger number of entrants used Java. It also is a language taught in many colleges, and might reflect greater student participation in these languages (although MIT did focus on Lisp back in the day...).
I can post charts of other languages if anyone asks - otherwise, download the files for yourself and draw your own conclusions. And congratulations to
Gábor Melis - I am again feeling the inspiration to delve into the mysteries of Lisp and meander among mountains of parenthesis...
No need to proceed further unless you are interested in how the results listed above were derived.
Basically, I used Ruby to scrape the results from the Google AI Rankings site. The results were read into Ruby, and ggplot2 and sqldf libraries were used to analyze the results.
Get the Data into R
So to find out more...I whipped up a ruby script to create a delimited file from the 47 page listing online. (Feel free to get these from their GitHub location and do some additional validation/analysis of your own). Read this file into R:
df$V7 <- NULL
names(df)<- c('rank', 'username','country','organization','language','elo_score')
Most of this work can be done in idiomatic R (which has some significant Lisp influences) - which might be a better way to honor the winner. However, I find myself using sqlite more and more these days - particularly in mobile development. So I used the sqldf library which uses this database behind the scenes.
Country rankings are available online, and the following emulates these results. Specifically, the number of entrants in the top 200 ranked contestants from each country can be derived as follows:
top200=df[df$rank <= 200,]
sqldf('select country, count(*) from top200 group by country order by 2 desc')
Organization rankings are similar, representing the top organizations within the top 100. There are some anomalies here, the highest ranking "Other" is not shown in the online version for obvious reasons, and only the most of these have only one entrant in the top 100 an are listed in an arbitrary manner. However, the results are otherwise the same in R.
top100=df[df$rank <= 100,]
sqldf('select organization, count(*) from top100 group by organization order by 2 desc')
The following are additional snippets of R code used to generate the results above.
# Language Usage
sqldf('select language, count(*) from df group by language order by 2 desc')
sqldf('select language, count(*) from top200 group by language order by 2 desc')
sqldf('select language, count(*) from top100 group by language order by 2 desc')
top10=df[df$rank <= 10,]
sqldf('select language, count(*) from top10 group by language order by 2 desc')
If you fiddle enough with the bucket size for histograms, you might be able to draw some conclusions... but the density plot seemed like a nicer option.
# Substitute your favorite language of those available for Lisp below
qplot(data=df[df$language=='Lisp',], x=rank, geom='histogram', binwidth=1000) + opts(title='Lisp')
# The density plot at the top of this posting:
ggplot(data=df, aes(rank, fill=language)) +
geom_density(alpha = 0.2) +
opts(title='2010 Google AI Challenge Rankings')
# Breakdown by language:
ggplot(data=df[df$language=='Scala',], aes(rank, fill=language)) + geom_density(alpha = 0.2) + xlim(0,5000) + opts(title='Scala')
Update: I have been keeping up with the comments - and sketched out some other ways of looking at the data in another post.