Blog Archives

Public Data Release of Stack Overflow’s 2018 Developer Survey

May 29, 2018
By

Note: Cross-posted with the Stack Overflow blog. Starting today, you can access the public data release for Stack Overflow’s 2018 Developer Survey. Over 100,000 developers from around the world shared their opinions about everything from their fa...

Read more »

Understanding PCA using Stack Overflow data

May 17, 2018
By

This year, I have given some talks about understanding principal component analysis using what I spend day in and day out with, Stack Overflow data. You can see a recording of one of these talks from rstudio::conf 2018. When I have given these talks, I’ve focused a lot on understanding PCA. This blog post walks through how I implemented...

Read more »

Stack Overflow questions around the world

April 10, 2018
By

I am so lucky to work with so many generous, knowledgeable, and amazing people at Stack Overflow, including Ian Allen and Kirti Thorat. Both Ian and Kirti are part of biweekly sessions we have at Stack Overflow where several software developers join me in practicing R, data science, and modeling skills. This morning, the two of them went to...

Read more »

The game is afoot! Topic modeling of Sherlock Holmes stories

January 24, 2018
By

In a recent release of tidytext, we added tidiers and support for building Structural Topic Models from the stm package. This is my current favorite implementation of topic modeling in R, so let’s walk through an example of how to get started with th...

Read more »

tidytext 0.1.6

January 9, 2018
By
tidytext 0.1.6

I am pleased to announce that tidytext 0.1.6 is now on CRAN! Most of this release, as well as the 0.1.5 release which I did not blog about, was for maintenance, updates to align with API changes from tidytext’s dependencies, and bugs. I just spent a good chunk of effort getting tidytext to pass R CMD check on older versions...

Read more »

Tidy word vectors, take 2!

November 26, 2017
By
Tidy word vectors, take 2!

A few weeks ago, I wrote a post about finding word vectors using tidy data principles, based on an approach outlined by Chris Moody on the StitchFix tech blog. I’ve been pondering how to improve this approach, and whether it would be nice to wrap up some of these functions in a package, so here is an update! Like in...

Read more »

New sports from random emoji

November 24, 2017
By
New sports from random emoji

I love emoji ❤️ and I love xkcd, so this recent comic from Randall Munroe was quite a delight for me. I sat there, enjoying the thought of these new sports like horse hole and multiplayer avocado and I thought, “I can make more of these in just the barest handful of lines of code”. This is largely thanks to...

Read more »

Word Vectors with tidy data principles

October 29, 2017
By
Word Vectors with tidy data principles

Last week I saw Chris Moody’s post on the Stitch Fix blog about calculating word vectors from a corpus of text using word counts and matrix factorization, and I was so excited! This blog post illustrates how to implement that approach to find word vector representations in R using tidy data principles and sparse matrices. Word vectors, or word embeddings,...

Read more »

From Power Calculations to P-Values: A/B Testing at Stack Overflow

October 16, 2017
By
From Power Calculations to P-Values: A/B Testing at Stack Overflow

Note: cross-posted with the Stack Overflow blog. If you hang out on Meta Stack Overflow, you may have noticed news from time to time about A/B tests of various features here at Stack Overflow. We use A/B testing to compare a new version to a baseline f...

Read more »

Mapping ecosystems of software development

October 2, 2017
By
Mapping ecosystems of software development

I have a new post on the Stack Overflow blog today about the complex, interrelated ecosystems of software development. On the data team at Stack Overflow, we spend a lot of time and energy thinking about tech ecosystems and how technologies are related to each other. One way to get at this idea of relationships between technologies is tag...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)