StackOverflow, the popular Q&A site for programmers, provides useful information to nearly 5 million programmers worldwide with its database of questions and answers — not to mention the additional comments that other programmers provide. (You might be interested in the architecture, based SQL Server 2016, required to deliver the 8.5 billion pages Stack Overflow served last year.) Since its inception, StackOverflow has has a policy of sharing all of this content under a Creative Commons license. This represents a rich trove of unstructured data for analysis, especially given that the database of 13 million questions, 21 million answers and 54 million comments (and growing) is easily accessible via StackExchange Data Explore, Kaggle and Google BigQuery.
Various data scientists have investigated this database, and learned some interesting things about programmers in the process. Here are a few examples, with links to the complete reports.
Sara Robinson analyzed the sentiment of Stack Overflow comments (based on phrases like “thank you” or “stupid”) and found that R users seem to be the happiest, while Objective-C users are the angriest.
|Tags with happiest comments||Tags with angriest comments|
David Robinson analyzed developer job titles over time and found terms that were on the rise (for example “full stack”) and terms on the decline (like “webmaster”).
David Robinson also analyzed regional differences between programmers and compared the most popular tags used in San Francisco, London, Bangalore and New York. (R is the third most popular language in New York by this measure.)
Max Woolf analyzed the results of the StackOverflow Developer Survey and found this relationship between self-described skill level and salary.
Seen any other interesting analyses of the StackOverflow data, or done one yourself? Let us know in the comments.