Blog Archives

Understanding Bayesian A/B testing (using baseball statistics)

May 23, 2016
By
Understanding Bayesian A/B testing (using baseball statistics)

Previously in this series Understanding the beta distribution (using baseball statistics) Understanding empirical Bayes estimation (using baseball statistics) Understanding credible intervals (using baseball statistics) Understanding the Bayesian approach to false discovery rates (using baseball statistics) Who is a better batter: Mike Piazza or Hank Aaron? Well, Mike Piazza has a slightly higher career batting average (2127 hits / 6911...

Read more »

The adblockr package: block ads from the monetizr package

April 1, 2016
By

I was horrified to learn of the existence of the monetizr package, which adds advertisements to R functions. The package goes against the entire philosophy of open source and the spirit of the R community. Luckily, I was able to construct a fix- the a...

Read more »

The monetizr package: make money on your open source R packages

March 31, 2016
By

I’ve had the great privilege to be a small part of the R open source community, contributing packages like broom, gganimate, fuzzyjoin, and ggfreehand. In the process I’ve become friends and colleagues with brilliant statisticians and data scientists and learned to engage with data in powerful ways. But there’s one thing that my colleagues and I haven’t gotten from R...

Read more »

How to replace a pie chart

March 14, 2016
By
How to replace a pie chart

Yesterday a family member forwarded me a Wall Street Journal interview titled What Data Scientists Do All Day At Work. The title intrigued me immediately, partly because I find myself explaining that same topic somewhat regularly. I wasn’t disappointed in the interview: General Electric’s Dr. Narasimhan gave insightful and well-communicated answers, and I both recognized familiar opinions and learned new...

Read more »

Why I use ggplot2

February 12, 2016
By
Why I use ggplot2

If you’ve read my blog, taken one of my classes, or sat next to me on an airplane, you probably know I’m a big fan of Hadley Wickham’s ggplot2 package, especially compared to base R plotting. Not everyone agrees. Among the anti-ggplot2 crowd is JHU Professor Jeff Leek, who yesterday wrote up his thoughts on the Simply Statistics blog: ...

Read more »

Analyzing networks of characters in ‘Love Actually’

December 25, 2015
By
Analyzing networks of characters in ‘Love Actually’

Every Christmas Eve, my family watches Love Actually. Objectively it’s not a particularly, er, good movie, but it’s well-suited for a holiday tradition. (Vox has got my back here). Even on the eighth or ninth viewing, it’s impressive what an intricate network of characters it builds. This got me wondering how we could visualize the connections quantitatively, based on how...

Read more »

The ‘lost boarding pass’ puzzle: efficient simulation in R

December 11, 2015
By
The ‘lost boarding pass’ puzzle: efficient simulation in R

A family member recently sent me a puzzle: One hundred people are lined up with their boarding passes showing their seats on the 100-seat Plane. The first guy in line drops his pass as he enters the plane, and unable to pick it up with others behind him sits in a random seat. The people behind him,...

Read more »

Modeling gene expression with broom: a case study in tidy analysis

November 25, 2015
By
Modeling gene expression with broom: a case study in tidy analysis

Previously in this series Cleaning and visualizing genomic data: a case study in tidy analysis In the last post, we examined an available genomic dataset from Brauer et al 2008 about yeast gene expression under nutrient starvation. We learned to tidy it with the dplyr and tidyr packages, and saw how useful this tidied form is for visualizing and understanding individual...

Read more »

Cleaning and visualizing genomic data: a case study in tidy analysis

November 19, 2015
By
Cleaning and visualizing genomic data: a case study in tidy analysis

I recently ran into a question looking for a case study in genomics, particularly for teaching ggplot2, dplyr, and the tidy data framework developed by Hadley Wickham. There exist many great resources for learning how to analyze genomic data using Bioconductor tools, including these workflows and package vignettes. But case studies for teaching the suite of tidy tools on...

Read more »

What are the most polarizing programming languages?

November 3, 2015
By
What are the most polarizing programming languages?

Users on Stack Overflow Careers, our site for matching developers with jobs, can create customized profiles (“CVs”) to show to prospective employers. As part of these profiles, they have the option of specifying specific technologies they like or dislike. This produces an interesting and unusual opportunity for our data team to analyze the opinions of over 150,000 developers. There are...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)