What can we learn from StackOverflow data?

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

StackOverflow, the popular Q&A site for programmers, provides useful information to nearly 5 million programmers worldwide with its database of questions and answers — not to mention the additional comments that other programmers provide. (You might be interested in the architecture, based SQL Server 2016, required to deliver the 8.5 billion pages Stack Overflow served last year.) Since its inception, StackOverflow has has a policy of sharing all of this content under a Creative Commons license. This represents a rich trove of unstructured data for analysis, especially given that the database of 13 million questions, 21 million answers and 54 million comments (and growing) is easily accessible via StackExchange Data ExploreKaggle and Google BigQuery.

Various data scientists have investigated this database, and learned some interesting things about programmers in the process. Here are a few examples, with links to the complete reports.

Sara Robinson analyzed the sentiment of Stack Overflow comments (based on phrases like “thank you” or “stupid”) and found that R users seem to be the happiest, while Objective-C users are the angriest.

Happiest   Angriest
Tags with happiest comments   Tags with angriest comments

David Robinson analyzed developer job titles over time and found terms that were on the rise (for example “full stack”) and terms on the decline (like “webmaster”).

Job titles

David Robinson also analyzed regional differences between programmers and compared the most popular tags used in San Francisco, London, Bangalore and New York. (R is the third most popular language in New York by this measure.)

SO regions

Max Woolf analyzed the results of the StackOverflow Developer Survey and found this relationship between self-described skill level and salary.


Seen any other interesting analyses of the StackOverflow data, or done one yourself? Let us know in the comments.


To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)