Blog Archives

Gender and verbs across 100,000 stories: a tidy analysis

April 27, 2017
By
Gender and verbs across 100,000 stories: a tidy analysis

Previously in this series Examining the arc of 100,000 stories I was fascinated by my colleague Julia Silge’s recent blog post on what verbs tend to occur after “he” or “she” in several novels, and what they might imply about gender roles within fictional work. This made me wonder what trends could be found across a larger dataset...

Read more »

Examining the arc of 100,000 stories: a tidy analysis

April 26, 2017
By
Examining the arc of 100,000 stories: a tidy analysis

I recently came across a great natural language dataset from Mark Riedel: 112,000 plots of stories downloaded from English language Wikipedia. This includes books, movies, TV episodes, video games- anything that has a Plot section on a Wikipedia page. ...

Read more »

Announcing the release of my e-book: Introduction to Empirical Bayes

February 7, 2017
By
Announcing the release of my e-book: Introduction to Empirical Bayes

I’m excited to announce the release of my new e-book: Introduction to Empirical Bayes: Examples from Baseball Statistics, available here. This book is adapted from a series of ten posts on my blog, starting with Understanding the beta distribution a...

Read more »

Simulation of empirical Bayesian methods (using baseball statistics)

January 11, 2017
By
Simulation of empirical Bayesian methods (using baseball statistics)

Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization The ebbr package We’re approaching the end of this series on...

Read more »

Introducing the ebbr package for empirical Bayes estimation (using baseball statistics)

January 5, 2017
By
Introducing the ebbr package for empirical Bayes estimation (using baseball statistics)

Previously in this series: The beta distribution Empirical Bayes estimation Credible intervals The Bayesian approach to false discovery rates Bayesian A/B testing Beta-binomial regression Understanding empirical Bayesian hierarchical modeling Mixture models and expectation-maximization We’ve introduced a number of statistical techniques in this series: estimating a beta...

Read more »

Understanding mixture models and expectation-maximization (using baseball statistics)

January 2, 2017
By
Understanding mixture models and expectation-maximization (using baseball statistics)

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Understanding empirical Bayesian hierarchical modeling In this series on empirical Bayesian methods on baseball data, we’ve been...

Read more »

Analysis of software developers in New York, San Francisco, London and Bangalore

December 1, 2016
By
Analysis of software developers in New York, San Francisco, London and Bangalore

(Note: Cross-posted with the Stack Overflow Blog.) When I tell someone Stack Overflow is based in New York City, they’re often surprised: many people assume it’s in San Francisco. (I’ve even seen job applications with “I’m in New York but willing to relocate to San Francisco” in the cover letter.) San Francisco is a safe guess of where an...

Read more »

The ‘deadly board game’ puzzle: efficient simulation in R

October 19, 2016
By
The ‘deadly board game’ puzzle: efficient simulation in R

Last Friday’s “The Riddler” column on FiveThirtyEight presents an interesting probabilistic puzzle: While traveling in the Kingdom of Arbitraria, you are accused of a heinous crime. Arbitraria decides who’s guilty or innocent not through a court system, but a board game. It’s played on a simple board: a track with sequential spaces numbered from 0 to 1,000....

Read more »

Understanding empirical Bayesian hierarchical modeling (using baseball statistics)

October 11, 2016
By
Understanding empirical Bayesian hierarchical modeling (using baseball statistics)

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Suppose you were a scout hiring a new baseball player, and were choosing between two that have had 100...

Read more »

Tidying computational biology models with biobroom: a case study in tidy analysis

September 6, 2016
By
Tidying computational biology models with biobroom: a case study in tidy analysis

Previously in this series: Cleaning and visualizing genomic data: a case study in tidy analysis Modeling gene expression with broom: a case study in tidy analysis In previous posts, I’ve examined the benefits of the tidy data framework in cleaning, visualizing, and modeling in exploratory data analysis on a molecular biology experiment. We’re using Brauer et al...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)