Articles by David Robinson

One year as a Data Scientist at Stack Overflow

June 20, 2016 | David Robinson

One day in January 2013 I found myself wasting time on the internet. This wasn’t a good idea: I was as busy as anyone 2.5 years into their PhD. I had to finish a presentation on some yeast genetics research, I was months behind on a paper with an NYU collaborator ...

[Read more...]

Understanding beta binomial regression (using baseball statistics)

May 31, 2016 | David Robinson

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing In this series we’ve been using the empirical Bayes method to estimate batting averages of baseball players. Empirical Bayes is useful ... [Read more...]

Understanding Bayesian A/B testing (using baseball statistics)

May 23, 2016 | David Robinson

Previously in this series Understanding the beta distribution (using baseball statistics) Understanding empirical Bayes estimation (using baseball statistics) Understanding credible intervals (using baseball statistics) Understanding the Bayesian approach to false discovery rates (using baseball statistics) Who is a better batter: Mike Piazza or Hank Aaron? Well, Mike Piazza has a ... [Read more...]

The adblockr package: block ads from the monetizr package

April 1, 2016 | David Robinson

I was horrified to learn of the existence of the monetizr package, which adds advertisements to R functions. The package goes against the entire philosophy of open source and the spirit of the R community. Luckily, I was able to construct a fix- the a... [Read more...]

The monetizr package: make money on your open source R packages

March 31, 2016 | David Robinson

I’ve had the great privilege to be a small part of the R open source community, contributing packages like broom, gganimate, fuzzyjoin, and ggfreehand. In the process I’ve become friends and colleagues with brilliant statisticians and data scientists and learned to engage with data in powerful ways. But ... [Read more...]

How to replace a pie chart

March 14, 2016 | David Robinson

Yesterday a family member forwarded me a Wall Street Journal interview titled What Data Scientists Do All Day At Work. The title intrigued me immediately, partly because I find myself explaining that same topic somewhat regularly. I wasn’t disappointed in the interview: General Electric’s Dr. Narasimhan gave insightful ... [Read more...]

Why I use ggplot2

February 12, 2016 | David Robinson

If you’ve read my blog, taken one of my classes, or sat next to me on an airplane, you probably know I’m a big fan of Hadley Wickham’s ggplot2 package, especially compared to base R plotting. Not everyone agrees. Among the anti-ggplot2 crowd is JHU Professor Jeff ... [Read more...]

Analyzing networks of characters in ‘Love Actually’

December 25, 2015 | David Robinson

Every Christmas Eve, my family watches Love Actually. Objectively it’s not a particularly, er, good movie, but it’s well-suited for a holiday tradition. (Vox has got my back here). Even on the eighth or ninth viewing, it’s impressive what an intricate network of characters it builds. This ...

[Read more...]

The ‘lost boarding pass’ puzzle: efficient simulation in R

December 11, 2015 | David Robinson

A family member recently sent me a puzzle: One hundred people are lined up with their boarding passes showing their seats on the 100-seat Plane. The first guy in line drops his pass as he enters the plane, and unable to pick it up with others behind him sits in ... [Read more...]

Modeling gene expression with broom: a case study in tidy analysis

November 25, 2015 | David Robinson

Previously in this series Cleaning and visualizing genomic data: a case study in tidy analysis In the last post, we examined an available genomic dataset from Brauer et al 2008 about yeast gene expression under nutrient starvation. We learned to tidy it with the dplyr and tidyr packages, and saw how ... [Read more...]

Cleaning and visualizing genomic data: a case study in tidy analysis

November 19, 2015 | David Robinson

I recently ran into a question looking for a case study in genomics, particularly for teaching ggplot2, dplyr, and the tidy data framework developed by Hadley Wickham. There exist many great resources for learning how to analyze genomic data using Bioconductor tools, including these workflows and package vignettes. But case ...

[Read more...]

What are the most polarizing programming languages?

November 3, 2015 | David Robinson

Users on Stack Overflow Careers, our site for matching developers with jobs, can create customized profiles (“CVs”) to show to prospective employers. As part of these profiles, they have the option of specifying specific technologies they like or dislike. This produces an interesting and unusual opportunity for our data team ... [Read more...]

Understanding the Bayesian approach to false discovery rates (using baseball statistics)

November 2, 2015 | David Robinson

Previously in this series Understanding the beta distribution (using baseball statistics) Understanding empirical Bayes estimation (using baseball statistics) Understanding credible intervals (using baseball statistics) In my last few posts, I’ve been exploring how to perform estimation of batting averages, as a way to demonstrate empirical Bayesian methods. We’ve ... [Read more...]

Understanding credible intervals (using baseball statistics)

October 20, 2015 | David Robinson

Previously in this series Understanding the beta distribution (using baseball statistics) Understanding empirical Bayes estimation (using baseball statistics) In my last post, I explained the method of empirical Bayes estimation, a way to calculate useful proportions out of many pairs of success/total counts (e.g. 0/1, 3/10, 235/1000). I used the example ... [Read more...]

Understanding empirical Bayes estimation (using baseball statistics)

September 30, 2015 | David Robinson

Which of these two proportions is higher: 4 out of 10, or 300 out of 1000? This sounds like a silly question. Obviously , which is greater than . But suppose you were a baseball recruiter, trying to decide which of two potential players is a better batter based on how many hits they get. One ... [Read more...]

Is Bayesian A/B Testing Immune to Peeking? Not Exactly

August 20, 2015 | David Robinson

Since I joined Stack Exchange as a Data Scientist in June, one of my first projects has been reconsidering the A/B testing system used to evaluate new features and changes to the site. Our current approach relies on computing a p-value to measure our confidence in a new feature. ... [Read more...]

Slides from my talk on the broom package

April 13, 2015 | David Robinson

This weekend I gave a presentation on my broom package for tidying model objects (see my introduction here) at the UP-STAT 2015 conference at SUNY Geneseo. I’m sharing the slides here, along with some highlights below. I first explained how broom fits with other tidy tools such as dplyr, tidyr ... [Read more...]

broom: a package for tidying statistical models into data frames

March 19, 2015 | David Robinson

The concept of “tidy data”, as introduced by Hadley Wickham, offers a powerful framework for data manipulation, analysis, and visualization. Popular packages like dplyr, tidyr and ggplot2 take great advantage of this framework, as explored in several recent posts by others. But there’s an important step in a tidy ... [Read more...]

View package downloads over time with Shiny

March 5, 2015 | David Robinson

Almost everyone with an R package in CRAN wonders how often it’s installed and used. Two years ago RStudio kindly started offering anonymized logs of their downloads from their CRAN mirror, which allows one to graph the number of downloads over time. Much easier than downloading and processing all ... [Read more...]

Introducing stackr: An R package for querying the Stack Exchange API

February 3, 2015 | David Robinson

There’s no end of interesting data analyses that can be performed with Stack Overflow and the Stack Exchange network of Q&A sites. Earlier this week I posted a Shiny app that visualizes the personalized prediction data from their machine learning system, Providence. I’ve also looked at whether ... [Read more...]

« 1 2 3 4 »

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by David Robinson

One year as a Data Scientist at Stack Overflow

Understanding beta binomial regression (using baseball statistics)

Understanding Bayesian A/B testing (using baseball statistics)

The adblockr package: block ads from the monetizr package

The monetizr package: make money on your open source R packages

How to replace a pie chart

Why I use ggplot2

Analyzing networks of characters in ‘Love Actually’

The ‘lost boarding pass’ puzzle: efficient simulation in R

Modeling gene expression with broom: a case study in tidy analysis

Cleaning and visualizing genomic data: a case study in tidy analysis

What are the most polarizing programming languages?

Understanding the Bayesian approach to false discovery rates (using baseball statistics)

Understanding credible intervals (using baseball statistics)

Understanding empirical Bayes estimation (using baseball statistics)

Is Bayesian A/B Testing Immune to Peeking? Not Exactly

Slides from my talk on the broom package

broom: a package for tidying statistical models into data frames

View package downloads over time with Shiny

Introducing stackr: An R package for querying the Stack Exchange API

Articles by David Robinson

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)