Programming Languages Usage
1 Java 1634
2 C++ 1232
3 Python 948
4 C# 485
5 PHP 80
6 Ruby 55
7 Haskell 51
8 Perl 42
9 Lisp 33
11 C 18
12 OCaml 12
13 Go 6
14 Scala 4
15 Groovy 1
1 Java 33
2 C++ 32
3 Python 20
4 C# 9
5 C 3
6 Haskell 1
7 Lisp 1
8 OCaml 1
The plot above is a bit difficult to discern due to the number of languages represented (and similarity in colors). So here is a breakdown by language.
Lisp does appear to be skewed towards higher ranking. But even more striking are the C hippies:
The functional crowd represented with Haskell also ranked on the higher end:
How about Java? There is a trend towards the average – but a significantly larger number of entrants used Java. It also is a language taught in many colleges, and might reflect greater student participation in these languages (although MIT did focus on Lisp back in the day…).
How about representatives from the Microsoft? Einstein and Elvis showed up – Mort was not interested.
I can post charts of other languages if anyone asks – otherwise, download the files for yourself and draw your own conclusions. And congratulations to
No need to proceed further unless you are interested in how the results listed above were derived.
Basically, I used Ruby to scrape the results from the Google AI Rankings site. The results were read into Ruby, and ggplot2 and sqldf libraries were used to analyze the results.
Get the Data into R
So to find out more…I whipped up a ruby script to create a delimited file from the 47 page listing online. (Feel free to get these from their GitHub location and do some additional validation/analysis of your own). Read this file into R:
df$V7 <- NULL
names(df)<- c(‘rank’, ‘username’,’country’,’organization’,’language’,’elo_score’)
Most of this work can be done in idiomatic R (which has some significant Lisp influences) – which might be a better way to honor the winner. However, I find myself using sqlite more and more these days – particularly in mobile development. So I used the sqldf library which uses this database behind the scenes.
Country rankings are available online, and the following emulates these results. Specifically, the number of entrants in the top 200 ranked contestants from each country can be derived as follows:
sqldf(‘select country, count(*) from top200 group by country order by 2 desc’)
Organization rankings are similar, representing the top organizations within the top 100. There are some anomalies here, the highest ranking “Other” is not shown in the online version for obvious reasons, and only the most of these have only one entrant in the top 100 an are listed in an arbitrary manner. However, the results are otherwise the same in R.
The following are additional snippets of R code used to generate the results above.
# Language Usage
sqldf(‘select language, count(*) from top200 group by language order by 2 desc’)
sqldf(‘select language, count(*) from top100 group by language order by 2 desc’)
top10=df[df$rank <= 10,]
sqldf(‘select language, count(*) from top10 group by language order by 2 desc’)
# Substitute your favorite language of those available for Lisp below
qplot(data=df[df$language==’Lisp’,], x=rank, geom=’histogram’, binwidth=1000) + opts(title=’Lisp’)
# The density plot at the top of this posting:
ggplot(data=df, aes(rank, fill=language)) +
geom_density(alpha = 0.2) +
opts(title=’2010 Google AI Challenge Rankings’)
# Breakdown by language:
ggplot(data=df[df$language==’Scala’,], aes(rank, fill=language)) + geom_density(alpha = 0.2) + xlim(0,5000) + opts(title=’Scala’)