Blog Archives

Understanding empirical Bayesian hierarchical modeling (using baseball statistics)

October 11, 2016
By
Understanding empirical Bayesian hierarchical modeling (using baseball statistics)

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Suppose you were a scout hiring a new baseball player, and were choosing between two that have had 100 at-bats each: A left-handed batter who has...

Read more »

Tidying computational biology models with biobroom: a case study in tidy analysis

September 6, 2016
By
Tidying computational biology models with biobroom: a case study in tidy analysis

Previously in this series: Cleaning and visualizing genomic data: a case study in tidy analysis Modeling gene expression with broom: a case study in tidy analysis In previous posts, I’ve examined the benefits of the tidy data framework in cleaning, visualizing, and modeling in exploratory data analysis on a molecular biology experiment. We’re using Brauer et al 2008 as our...

Read more »

Analysis of the #7FavPackages hashtag

August 26, 2016
By
Analysis of the #7FavPackages hashtag

Twitter has seen a recent trend of “first 7” and “favorite 7” hashtags, like #7FirstJobs and #7FavFilms. Last week I added one to the mix, about my 7 favorite R packages: devtoolsdplyrggplot2knitrRcpprmarkdownshiny#7FavPackages #rstats— David Robinson (@drob) August 16, 2016 Hadley Wickham agreed to share his own, but on one condition: @drob I'll do it if you write a script to scrape the...

Read more »

useR and JSM 2016 conferences: a story in tweets

August 23, 2016
By
useR and JSM 2016 conferences: a story in tweets

I was amused by a Guardian article last month that declared “I’m a serious academic, not a professional Instagrammer,” arguing that social media is a distraction for scientific research. This attitude was, to say the least, not popular on academic Twitter, which responded with the #seriousacademic hashtag. When someone tries to claim that a #seriousacademic should not use twitter......

Read more »

Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half

August 9, 2016
By
Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half

I don’t normally post about politics (I’m not particularly savvy about polling, which is where data science has had the largest impact on politics). But this weekend I saw a hypothesis about Donald Trump’s twitter account that simply begged to be investigated with data: Every non-hyperbolic tweet is from iPhone (his staff). Every hyperbolic tweet is from Android (from him)....

Read more »

Does sentiment analysis work? A tidy analysis of Yelp reviews

July 21, 2016
By
Does sentiment analysis work? A tidy analysis of Yelp reviews

This year Julia Silge and I released the tidytext package for text mining using tidy tools such as dplyr, tidyr, ggplot2 and broom. One of the canonical examples of tidy text mining this package makes possible is sentiment analysis. Sentiment analysis is often used by companies to quantify general social media opinion (for example, using tweets about several brands to...

Read more »

stacksurveyr: An R package with the 2016 Developer Survey Results

July 18, 2016
By
stacksurveyr: An R package with the 2016 Developer Survey Results

This year, more than fifty thousand programmers answered the Stack Overflow 2016 Developer Survey, in the largest survey of professional developers in history. Last week Stack Overflow released the full (anonymized) results of the survey at stackoverf...

Read more »

Releasing the StackLite dataset of Stack Overflow questions and tags

July 18, 2016
By
Releasing the StackLite dataset of Stack Overflow questions and tags

At Stack Overflow we’ve always been committed to sharing data: all content contributed to the site is CC-BY-SA licensed, and we release regular “data dumps” of our entire history of questions and answers. I’m excited to announce a new resource...

Read more »

One year as a Data Scientist at Stack Overflow

June 20, 2016
By
One year as a Data Scientist at Stack Overflow

One day in January 2013 I found myself wasting time on the internet. This wasn’t a good idea: I was as busy as anyone 2.5 years into their PhD. I had to finish a presentation on some yeast genetics research, I was months behind on a paper with an NYU collaborator and even farther behind on some leftover undergraduate research....

Read more »

Understanding beta binomial regression (using baseball statistics)

May 31, 2016
By
Understanding beta binomial regression (using baseball statistics)

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing In this series we’ve been using the empirical Bayes method to estimate batting averages of baseball players. Empirical Bayes is useful here because when we don’t have a lot of...

Read more »

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)