Blog Archives

Analysis of software developers in New York, San Francisco, London and Bangalore

December 1, 2016
By
Analysis of software developers in New York, San Francisco, London and Bangalore

(Note: Cross-posted with the Stack Overflow Blog.) When I tell someone Stack Overflow is based in New York City, they’re often surprised: many people assume it’s in San Francisco. (I’ve even seen job applications with “I’m in New York but willing to relocate to San Francisco” in the cover letter.) San Francisco is a safe guess of where an...

Read more »

The ‘deadly board game’ puzzle: efficient simulation in R

October 19, 2016
By
The ‘deadly board game’ puzzle: efficient simulation in R

Last Friday’s “The Riddler” column on FiveThirtyEight presents an interesting probabilistic puzzle: While traveling in the Kingdom of Arbitraria, you are accused of a heinous crime. Arbitraria decides who’s guilty or innocent not through a court system, but a board game. It’s played on a simple board: a track with sequential spaces numbered from 0 to 1,000....

Read more »

Understanding empirical Bayesian hierarchical modeling (using baseball statistics)

October 11, 2016
By
Understanding empirical Bayesian hierarchical modeling (using baseball statistics)

Previously in this series: Understanding the beta distribution Understanding empirical Bayes estimation Understanding credible intervals Understanding the Bayesian approach to false discovery rates Understanding Bayesian A/B testing Understanding beta binomial regression Suppose you were a scout hiring a new baseball player, and were choosing between two that have had 100...

Read more »

Tidying computational biology models with biobroom: a case study in tidy analysis

September 6, 2016
By
Tidying computational biology models with biobroom: a case study in tidy analysis

Previously in this series: Cleaning and visualizing genomic data: a case study in tidy analysis Modeling gene expression with broom: a case study in tidy analysis In previous posts, I’ve examined the benefits of the tidy data framework in cleaning, visualizing, and modeling in exploratory data analysis on a molecular biology experiment. We’re using Brauer et al...

Read more »

Analysis of the #7FavPackages hashtag

August 26, 2016
By
Analysis of the #7FavPackages hashtag

Twitter has seen a recent trend of “first 7” and “favorite 7” hashtags, like #7FirstJobs and #7FavFilms. Last week I added one to the mix, about my 7 favorite R packages: devtoolsdplyrggplot2knitrRcpprmarkdownshiny#7FavPackages #rstats— David Robinson (@drob) August 16, 2016 Hadley Wickham agreed to share his own, but on one condition:

Read more »

useR and JSM 2016 conferences: a story in tweets

August 23, 2016
By
useR and JSM 2016 conferences: a story in tweets

I was amused by a Guardian article last month that declared “I’m a serious academic, not a professional Instagrammer,” arguing that social media is a distraction for scientific research. This attitude was, to say the least, not popular on academic Twitter, which responded with the #seriousacademic hashtag. When someone tries to claim that a...

Read more »

Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half

August 9, 2016
By
Text analysis of Trump’s tweets confirms he writes only the (angrier) Android half

I don’t normally post about politics (I’m not particularly savvy about polling, which is where data science has had the largest impact on politics). But this weekend I saw a hypothesis about Donald Trump’s twitter account that simply begged to be investigated with data: Every non-hyperbolic tweet is from iPhone (his staff). Every hyperbolic tweet is...

Read more »

Does sentiment analysis work? A tidy analysis of Yelp reviews

July 21, 2016
By
Does sentiment analysis work? A tidy analysis of Yelp reviews

This year Julia Silge and I released the tidytext package for text mining using tidy tools such as dplyr, tidyr, ggplot2 and broom. One of the canonical examples of tidy text mining this package makes possible is sentiment analysis. Sentiment analysis is often used by companies to quantify general social media opinion (for...

Read more »

stacksurveyr: An R package with the 2016 Developer Survey Results

July 18, 2016
By
stacksurveyr: An R package with the 2016 Developer Survey Results

This year, more than fifty thousand programmers answered the Stack Overflow 2016 Developer Survey, in the largest survey of professional developers in history. Last week Stack Overflow released the full (anonymized) results of the survey at stackoverf...

Read more »

Releasing the StackLite dataset of Stack Overflow questions and tags

July 18, 2016
By
Releasing the StackLite dataset of Stack Overflow questions and tags

At Stack Overflow we’ve always been committed to sharing data: all content contributed to the site is CC-BY-SA licensed, and we release regular “data dumps” of our entire history of questions and answers. I’m excited to announce a new resource...

Read more »

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)