GitHub Stats on Programming Languages

August 9, 2010
By

(This article was first published on R-Chart, and kindly contributed to R-bloggers)

GitHub has become a popular site for Open Source Developers to stash code and collaborate on projects.  The following are some stats and analysis related to programming languages in use based upon the number of users and repositories.  The data was obtained from GitHub's searches.   It and the R code are available in GitHub as well (a lovely recursive relationship I must say).



> df.top_ten_reps


      Language Repositories Users
1         Ruby       104239 23123
2   JavaScript        44482 10895
3         Perl        34232  2178
4       Python        32150  8775
5          PHP        21685  8872
6         Java        17687  6618
7            C        16137  5558
8          C++        12521  5595
9  Objective-C         8027  2520
10          C#         6061  2706

Ruby has a commanding lead in terms of the number of repositories with 32.17% - more than the next two (Javascript and Perl) combined.  R is ranked 25th with 191 repositories or about 0.06% and only 6 projects behind the D programming language.  The top 5 are scripting languages, Java ranks 6th and the C family rounds out the top ten.  Relatively open languages lead the pack, followed by those with a proprietary focus (Objective-C for Apple and C# for Microsoft).

When ranked by number of users, the top two remain the same.  There is a bit of shuffling with the remainder of the top ten.




> df.top_ten_users
      Language Repositories Users
1         Ruby       104239 23123
2   JavaScript        44482 10895
3          PHP        21685  8872
4       Python        32150  8775
5         Java        17687  6618
6          C++        12521  5595
7            C        16137  5558
8           C#         6061  2706
9  Objective-C         8027  2520
10        Perl        34232  2178


The most striking is that Perl drops to 9th place.  There are significantly less users associated with Perl - particularly for the number of projects.  I noticed that there was a migration of Perl language source code to GitHub - so perhaps modules were migrated as well...but I couldn't find any specific announcements that clarified this.

All other things being equal, you might expect there to be a relationship between of the number of users to repositories. There is to some degree -

> df.Ratio=df.Repositories / df.Users
> mean(df[df$Ratio > 0 & df$User > 0 & !is.na(df$Ratio), ])


5.064732

A linear model suggests a slightly lower value (between 4 and 5).



Here is a plot restricted to the Top 10.



Some of the lesser used languages have few users and more repositories like IO (19 per user) and CoffeeScript (17 per user).  Perl has a remarkable 15 per user.

Full data Set
        Language Repositories Users Rep.pct     Ratio
1           Ruby       104239 23123   32.17  4.508022
2     JavaScript        44482 10895   13.73  4.082790
3           Perl        34232  2178   10.56 15.717172
4         Python        32150  8775    9.92  3.663818
5            PHP        21685  8872    6.69  2.444206
6           Java        17687  6618    5.46  2.672560
7              C        16137  5558    4.98  2.903383
8            C++        12521  5595    3.86  2.237891
9    Objective-C         8027  2520    2.48  3.185317
10            C#         6061  2706    1.87  2.239837
11         Shell         4657  1011    1.44  4.606330
12          VimL         4248  1267    1.31  3.352802
13  ActionScript         2609  1104    0.81  2.363225
14        Erlang         2520   532    0.78  4.736842
15       Haskell         2290   641    0.71  3.572543
16         Scala         2154   539    0.66  3.996289
17       Clojure         2082   481    0.64  4.328482
18           Lua         1754   511    0.54  3.432485
19        Groovy          870   261    0.27  3.333333
20        Scheme          707   140    0.22  5.050000
21            Go          398   103    0.12  3.864078
22         OCaml          382   121    0.12  3.157025
23   Objective-J          355   109    0.11  3.256881
24             D          197    64    0.06  3.078125
25             R          191    69    0.06  2.768116
26    ColdFusion          180    56    0.06  3.214286
27           Tcl          125    39    0.04  3.205128
28           ooc          112    11    0.03 10.181818
29       FORTRAN           93    47    0.03  1.978723
30           ASP           88    35    0.03  2.514286
31     Smalltalk           80    14    0.02  5.714286
32          HaXe           75    14    0.02  5.357143
33            F#           74     5    0.02 14.800000
34       Verilog           74    26    0.02  2.846154
35          VHDL           64    14    0.02  4.571429
36            Io           57     3    0.02 19.000000
37 SuperCollider           53    11    0.02  4.818182
38           Arc           48    15    0.01  3.200000
39        Delphi           43    16    0.01  2.687500
40      Assembly           41     5    0.01  8.200000
41           Boo           41     6    0.01  6.833333
42            Nu           40     4    0.01 10.000000
43        Eiffel           39    15    0.01  2.600000
44  CoffeeScript           34     2    0.01 17.000000
45          Vala           27     3    0.01  9.000000
46        Racket           20     8    0.01  2.500000
47          Self            7     3    0.00  2.333333
48          Duby            4     0    0.00       Inf
49       Max/MSP            4     2    0.00  2.000000
50        sclang            2     0    0.00       Inf
51   Common Lisp            0     0    0.00       NaN
52    Emacs Lisp            0     0    0.00       NaN
53     Pure Data            0     0    0.00       NaN
54  Visual Basic            0     0    0.00       NaN

To leave a comment for the author, please follow the link and comment on his blog: R-Chart.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Tags:

Comments are closed.