“F-bombs” in GitHub Commits (warning: contains profanity)

July 30, 2014
By

(This article was first published on librestats » R, and kindly contributed to R-bloggers)

Warning: this post contains profanity...arguably excessive amounts of it.  If you are a humorless no-fun, you are recommended to proceed no further.

Seriously though, the title is quite descriptive of the content of this post.  If you are offended by the use of such language, or if your boss is likely to come peering over your shoulder soon, I don't recommend you proceed.

 

"F-Bombs" in Public GitHub Commits

So over on Reddit, I found some posts by Max Woolf where he was able to get all public GitHub commits containing the word "fuck" (also some extra data with "shit") from 3/11/2012 to 7/24/2014.  He posted the raw commit data, and was even kind enough to explain how he got the data in the first place.  He also provided a hilarious plot, showing the Languages with Most F-Bombs and S-bombs in Commit Messages.

I so fell in love with this data that I decided to put the data into an R package for even easier access.  Just poking at it, I was quickly able to answer some simple questions I had, including...

 

How many fucks do developers give?

numf

 

What's the most common fuck to give?

commits

 

Who gives the most fucks?

If we look at the users whose commits contain the most instances of "fuck", there is certainly a clear victor:

who_user

Let's group users by their repos (e.g., hadley/devtools --- and no, he's not on the list).  Maybe this way we'll see a different pattern...

Nope, same guy:

who_repo

I'm tempted to look at that repo, but I'm afraid I'll instantly lose my sanity, like some kind of rejected H. P. Lovecraft story.

 

Emergent fucks

Not enough variety in the dataset for you?  Try using the ngram package to use markov chains generate some new fucks, and other assorted nonesense.

Typical systems programmer:

This simplifies MT based class systems, and drastically improves performance on luajit as it will get, I should say), and I have over 300 confirmed kills.

Another typical systems programmer:

fuckfuckfuckfuck STATIC MOTHER FUCKER STATIC MOTHER FUCKER STATIC MOTHER FUCKER STATIC MOTHER FUCKER fucking shitfucks fucking shitfucks fucking shitfucks fucking typos fucking typos fucking typos fucking typos fuckckck fuckckck fuckckck fuckckck typo fuck typo fuck typo fuck typo fuck handle version fuckups better handle version fuckups better damn fuck damn fuck damn fuck damn fuckfuckfuckfuckfuck fucking coverage...

To leave a comment for the author, please follow the link and comment on his blog: librestats » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.