Blog Archives

TensorFlow, Jane Austen, and Text Generation

October 3, 2018
By

I remember the first time I saw a deep learning text generation project that was truly compelling and delightful to me. It was in 2016 when Andy Herd generated new Friends scenes by training a recurrent neural network on all the show’s episodes. Herd’s work went pretty viral at the time and I thought: via GIPHY And also: via GIPHY At the time...

Read more »

Training, evaluating, and interpreting topic models

September 7, 2018
By
Training, evaluating, and interpreting topic models

At the beginning of this year, I wrote a blog post about how to get started with the stm and tidytext packages for topic modeling. I have been doing more topic modeling in various projects, so I wanted to share some workflows I have found useful for training many topic models at one time, evaluating topic models and understanding model diagnostics,...

Read more »

Amazon Alexa and Accented English

July 18, 2018
By
Amazon Alexa and Accented English

Earlier this spring, one of my data science friends here in SLC got in contact with me about some fun analysis. My friend Dylan Zwick is a founder at Pulse Labs, a voice-testing startup, and they were chatting with the Washington Post about a piece on how devices like Amazon Alexa deal with accented English. The piece is published...

Read more »

Punctuation in literature

June 29, 2018
By

This morning I was scrolling through Twitter and noticed Alberto Cairo share this lovely data visualization piece by Adam J. Calhoun about the varying prevalence of punctuation in literature. I thought, “I want to do that!” It also offers me the opportunity to chat about a few of the new options available for tokenizing in tidytext via updates to...

Read more »

Public Data Release of Stack Overflow’s 2018 Developer Survey

May 29, 2018
By

Note: Cross-posted with the Stack Overflow blog. Starting today, you can access the public data release for Stack Overflow’s 2018 Developer Survey. Over 100,000 developers from around the world shared their opinions about everything from their fa...

Read more »

Understanding PCA using Stack Overflow data

May 17, 2018
By

This year, I have given some talks about understanding principal component analysis using what I spend day in and day out with, Stack Overflow data. You can see a recording of one of these talks from rstudio::conf 2018. When I have given these talks, I’ve focused a lot on understanding PCA. This blog post walks through how I implemented...

Read more »

Stack Overflow questions around the world

April 10, 2018
By

I am so lucky to work with so many generous, knowledgeable, and amazing people at Stack Overflow, including Ian Allen and Kirti Thorat. Both Ian and Kirti are part of biweekly sessions we have at Stack Overflow where several software developers join me in practicing R, data science, and modeling skills. This morning, the two of them went to...

Read more »

The game is afoot! Topic modeling of Sherlock Holmes stories

January 24, 2018
By

In a recent release of tidytext, we added tidiers and support for building Structural Topic Models from the stm package. This is my current favorite implementation of topic modeling in R, so let’s walk through an example of how to get started with th...

Read more »

tidytext 0.1.6

January 9, 2018
By
tidytext 0.1.6

I am pleased to announce that tidytext 0.1.6 is now on CRAN! Most of this release, as well as the 0.1.5 release which I did not blog about, was for maintenance, updates to align with API changes from tidytext’s dependencies, and bugs. I just spent a good chunk of effort getting tidytext to pass R CMD check on older versions...

Read more »

Tidy word vectors, take 2!

November 26, 2017
By
Tidy word vectors, take 2!

A few weeks ago, I wrote a post about finding word vectors using tidy data principles, based on an approach outlined by Chris Moody on the StitchFix tech blog. I’ve been pondering how to improve this approach, and whether it would be nice to wrap up some of these functions in a package, so here is an update! Like in...

Read more »

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)