R developers are the happiest! (based on analyzing Stack Overflow)

January 8, 2017
By
This post was written by Sara Robinson and originally published on Medium, and is being re-published here since it is particularly relevant to the R community. The post analyzes all Stack Overflow comments using Google BigQuery to determine which developers are happiest and angriest.  If you have comments or ideas for future analysis, find me on Twitter @SRobTweets.

It’s officially winter, so what could be better than drinking hot chocolate while querying the new Stack Overflow dataset in BigQuery? It has every Stack Overflow question, answer, comment, and more — which means endless possibilities of data crunching. Inspired by Felipe Hoffa’s post on how response time varies by tag, I wanted to look at the comments table (53 million rows!).

The happiest Stack Overflow tags 🙂

To measure happy comments I looked at comments with “thank you”, “thanks”, “awesome” or “:)” in the body. I limited the analysis to tags with more than 500,000 comments. Here’s the query:

#standardSQL
SELECT 
  tag, 
  ROUND((COUNT(case when comment_text like '%thanks%' or comment_text like '%:)%' or comment_text like '%thank you%' or comment_text like '%awesome%' then 1 end) / COUNT(*)) * 100,2) as percent_happy, 
  COUNT(*) total_comments
  FROM (
    SELECT
      LOWER(a.text) as comment_text, 
      SPLIT(b.tags, '|') as tags 
    FROM `bigquery-public-data.stackoverflow.comments` a
    JOIN `bigquery-public-data.stackoverflow.posts_questions` b
    ON a.post_id = b.id
    UNION ALL
    SELECT
      LOWER(b.text) as comment_text, 
      SPLIT(c.tags, '|') as tags 
    FROM `bigquery-public-data.stackoverflow.posts_answers` a
    JOIN (
     SELECT post_id, text FROM `bigquery-public-data.stackoverflow.comments`
    ) b
    ON a.id = b.post_id
    JOIN `bigquery-public-data.stackoverflow.posts_questions` c
    ON c.id = a.parent_id
), UNNEST(tags) tag
GROUP BY 1
HAVING total_comments > 500000
ORDER BY percent_happy DESC

Here’s the result in BigQuery:

And the chart:

happiest-comments-on-so

R, Ruby, HTML / CSS, and iOS are the communities with the happiest commenters according to this list. People who ask questions about XML and regular expressions also seem particularly thankful for help. If you’re curious, here are the 15 highest scoring happy comments that were short enough to fit in a screenshot (and their associated tags) :

But because people sometimes get angry on the internet, you’re probably wondering…

The angriest Stack Overflow tags 🙁

For angry comments, I counted those with “wrong”, “horrible”, “stupid”, or “:(” in the body. The SQL is the same as above with the search terms swapped out. Here’s the result:

And the chart:

angriest-comments-on-so

Clearly the angriest comments are those related to C derivatives. Many programming concepts also wound up here: multithreading, arrays, algorithms, and strings. And here are the highest scoring angry comments:

This analysis is not perfect, as the comment “that one’s so stupid it underflows and becomes awesome” appears in both lists. That’s where a machine learning tool like the Natural Language API would come in handy.

Between the two lists there were only a few tag overlaps. The most excitable tags (I’m interpreting tags that showed up in both the happy and angry list as ‘excitable’) are: ios, iphone, objective-c, and regex questions. And while the internet may seem like a dark place sometimes, there appears to be roughly six happy comments for every angry one.

What’s next?

Dive into the Stack Overflow dataset, or check out some of these awesome posts to get inspired:

If you have comments or ideas for future analysis, find me on Twitter @SRobTweets.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)