Blog Archives

Words growing or shrinking in Hacker News titles: a tidy analysis

June 8, 2017
By
Words growing or shrinking in Hacker News titles: a tidy analysis

In May, some friends and I built Tagger News, a real-time automatic classifier of Hacker News articles based on their text (see here for more about how we built it). This process started me down some interesting paths, particularly analyzing trends in ...

Read more »

Slides, videos, and tweets from the 2017 New York R Conference

May 22, 2017
By
Slides, videos, and tweets from the 2017 New York R Conference

In April I attended the 2017 New York R conference, hosted by Lander Analytics and Work-Bench. It was both the third time the conference was held and the third time I’ve attended, and it gets more fun each year, especially because this year eight of us attended from Stack Overflow (including all five of us on the Data Team). Now...

Read more »

Gender and verbs across 100,000 stories: a tidy analysis

April 27, 2017
By
Gender and verbs across 100,000 stories: a tidy analysis

Previously in this series Examining the arc of 100,000 stories I was fascinated by my colleague Julia Silge’s recent blog post on what verbs tend to occur after “he” or “she” in several novels, and what they might imply about gender roles within fictional work. This made me wonder what trends could be found across a larger dataset of stories. Mark Riedl’s...

Read more »

Examining the arc of 100,000 stories: a tidy analysis

April 26, 2017
By
Examining the arc of 100,000 stories: a tidy analysis

I recently came across a great natural language dataset from Mark Riedel: 112,000 plots of stories downloaded from English language Wikipedia. This includes books, movies, TV episodes, video games- anything that has a Plot section on a Wikipedia page. ...

Read more »

Announcing the release of my e-book: Introduction to Empirical Bayes

February 7, 2017
By
Announcing the release of my e-book: Introduction to Empirical Bayes

I’m excited to announce the release of my new e-book: Introduction to Empirical Bayes: Examples from Baseball Statistics, available here. This book is adapted from a series of ten posts on my blog, starting with Understanding the beta distribution a...

Read more »

Simulation of empirical Bayesian methods (using baseball statistics)

January 11, 2017
By
Simulation of empirical Bayesian methods (using baseball statistics)

Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization The ebbr package We’re approaching the end of this series on empirical Bayesian methods, and have touched on many statistical...

Read more »

Introducing the ebbr package for empirical Bayes estimation (using baseball statistics)

January 5, 2017
By
Introducing the ebbr package for empirical Bayes estimation (using baseball statistics)

Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization We’ve introduced a number of statistical techniques in this series: estimating a beta prior, beta-binomial regression, hypothesis testing, mixture models, and...

Read more »

Understanding mixture models and expectation-maximization (using baseball statistics)

January 2, 2017
By
Understanding mixture models and expectation-maximization (using baseball statistics)

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Understanding empirical Bayesian hierarchical modeling In this series on empirical Bayesian methods on baseball data, we’ve been treating our overall distribution of batting averages...

Read more »

Analysis of software developers in New York, San Francisco, London and Bangalore

December 1, 2016
By
Analysis of software developers in New York, San Francisco, London and Bangalore

(Note: Cross-posted with the Stack Overflow Blog.) When I tell someone Stack Overflow is based in New York City, they’re often surprised: many people assume it’s in San Francisco. (I’ve even seen job applications with “I’m in New York but willing to relocate to San Francisco” in the cover letter.) San Francisco is a safe guess of where an American...

Read more »

The ‘deadly board game’ puzzle: efficient simulation in R

October 19, 2016
By
The ‘deadly board game’ puzzle: efficient simulation in R

Last Friday’s “The Riddler” column on FiveThirtyEight presents an interesting probabilistic puzzle: While traveling in the Kingdom of Arbitraria, you are accused of a heinous crime. Arbitraria decides who’s guilty or innocent not through a court system, but a board game. It’s played on a simple board: a track with sequential spaces numbered from 0 to 1,000. The...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)