An updated look at the #code2013 language rankings

[This article was first published on lp0 On Fire, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

A few weeks ago I compared the #code2013 rankings from twitter to TIOBE’s rankings although when I had collected the #code2013 data people were still chiming in, albeit at a slowing pace. As I would visually scan the new tweets it seemed like there was a huge increase of Delphi & Object Pascal compared to the data I had collected previously, and it made me curious if this was a real effect or just coincidence. Luckily I had continued to collect the #code2013 data after I made that post so I had an opportunity to find out, considering I had 6028 tweets giving me 1404 more than the last time.

At the same time, I commented in my original post that I was unhappy with the mechanism which I used to strip manual retweets (i.e. manually adding RT instead of a built-in retweet), as I had removed any tweet from the data which contained a RT. Because people often add commentary to the left of the RT, I created a new function which would leave anything to the left of the RT (as well as MT) which should leave more useable data. This code now appears in the github version of twitter as the function strip_retweets(). Unfortunately, this didn’t make much of a difference – applying this new function to the original data set only gave me 23 more tweets worth of data, oh well. It was the thought that counted.

I processed the new dataset the same as the previous batch (all code included as a single gist below), and sure enough there was a large skew toward Delphi & Pascal in this batch. Note that I had tried to morph any usage of “object pascal” into a single “delphi/object pascal” entry, but presumably most people mentioning “pascal” mean delphi:

So despite the inclusion of about 30% more data, the results are very similar. So what happens if we look at the updated data against the TIOBE data as I did the first time?

Sure enough – when visually compared to the original, the pascal entries gained quite a lot (bouncing one of my favorites, Scala, down a tier). There were some other changes, most notably abap & c# gained while fortran lost but only ABAP had a very noticeable gain.

What happens if we only look at the new tweets against the TIOBE rankings. How much of a skew would Delphi show now?

As expected, Delphi took a huge leap forward. Also expected, some of the fringe languages fell off of this plot – which makes sense as we have about a third of the data so fewer opportunities to make the grade. You can also see some languages like R (another favorite) and ObjC dropping while others like Haskell and Matlab gaining.

So what happened? It seems reasonable to me to expect a fairly steady distribution over time, although clearly the social aspect to Twitter is affecting things causing viral gains and losses over time.

To leave a comment for the author, please follow the link and comment on their blog: lp0 On Fire. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)