“F-bombs” in GitHub Commits (warning: contains profanity)

[This article was first published on librestats » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Warning: this post contains profanity…arguably excessive amounts of it.  If you are a humorless no-fun, you are recommended to proceed no further.

Seriously though, the title is quite descriptive of the content of this post.  If you are offended by the use of such language, or if your boss is likely to come peering over your shoulder soon, I don’t recommend you proceed.


“F-Bombs” in Public GitHub Commits

So over on Reddit, I found some posts by Max Woolf where he was able to get all public GitHub commits containing the word “fuck” (also some extra data with “shit”) from 3/11/2012 to 7/24/2014.  He posted the raw commit data, and was even kind enough to explain how he got the data in the first place.  He also provided a hilarious plot, showing the Languages with Most F-Bombs and S-bombs in Commit Messages.

I so fell in love with this data that I decided to put the data into an R package for even easier access.  Just poking at it, I was quickly able to answer some simple questions I had, including…


How many fucks do developers give?



What’s the most common fuck to give?



Who gives the most fucks?

If we look at the users whose commits contain the most instances of “fuck”, there is certainly a clear victor:


Let’s group users by their repos (e.g., hadley/devtools — and no, he’s not on the list).  Maybe this way we’ll see a different pattern…

Nope, same guy:


I’m tempted to look at that repo, but I’m afraid I’ll instantly lose my sanity, like some kind of rejected H. P. Lovecraft story.


Emergent fucks

Not enough variety in the dataset for you?  Try using the ngram package to use markov chains generate some new fucks, and other assorted nonesense.

Typical systems programmer:

This simplifies MT based class systems, and drastically improves performance on luajit as it will get, I should say), and I have over 300 confirmed kills.

Another typical systems programmer:

fuckfuckfuckfuck STATIC MOTHER FUCKER STATIC MOTHER FUCKER STATIC MOTHER FUCKER STATIC MOTHER FUCKER fucking shitfucks fucking shitfucks fucking shitfucks fucking typos fucking typos fucking typos fucking typos fuckckck fuckckck fuckckck fuckckck typo fuck typo fuck typo fuck typo fuck handle version fuckups better handle version fuckups better damn fuck damn fuck damn fuck damn fuckfuckfuckfuckfuck fucking coverage…

To leave a comment for the author, please follow the link and comment on their blog: librestats » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)