Articles by Edwin Chen

Surge: Data Labeling You Can Trust

November 29, 2020 | Edwin Chen

tl;dr I started Surge earlier this year to fix the problems I've always encountered with getting high-quality, human-labeled data at scale. Think MTurk 2.0—but with an obsessive focus on quality and speed, and an elite workforce you can trust. If you'v... [Read more...]

Exploring LSTMs

May 29, 2017 | Edwin Chen

The first time I learned about LSTMs, my eyes glazed over. Not in a good, jelly donut kind of way. It turns out LSTMs are a fairly simple extension to neural networks, and they're behind a lot of the amazing achievements deep learning has made in the past few years. ... [Read more...]

Product Insights for Airbnb

November 19, 2015 | Edwin Chen

I love marketplaces and marketplace data, so a couple months ago I grabbed some Airbnb data and made a slide deck. A few people have asked me about it, so here it is along with a short summary. My goal was to gather data around potential product strategy, focusing on ...
[Read more...]

Soda vs. Pop with Twitter

July 6, 2012 | Edwin Chen

One of the great things about Twitter is that it’s a global conversation anyone can join anytime. Eavesdropping on the world, what what! Of course, it gets even better when you can mine all this chatter to study the way humans live and interact. For example, how do people ...
[Read more...]

Quick Introduction to ggplot2

January 17, 2012 | Edwin Chen

For a much better looking version of this post (where code is actually readable!), see this Github repository, which also contains some of the example datasets I use and a literate programming version of this tutorial. Introduction This is a bare-bones introduction to ggplot2, a visualization package in R. It ... [Read more...]

Introduction to Conditional Random Fields

January 2, 2012 | Edwin Chen

Imagine you have a sequence of snapshots from a day in Justin Bieber’s life, and you want to label each image with the activity it represents (eating, sleeping, driving, etc.). How can you do this? One way is to ignore the sequential nature of the snapshots, and build a ... [Read more...]

Winning the Netflix Prize: A Summary

October 23, 2011 | Edwin Chen

How was the Netflix Prize won? I went through a lot of the Netflix Prize papers a couple years ago, so I’ll try to give an overview of the techniques that went into the winning solution here. Normalization of Global Effects Suppose Alice rates Inception 4 stars. We can think ... [Read more...]

Stuff Harvard People Like

September 28, 2011 | Edwin Chen

What types of students go to which schools? There are, of course, the classic stereotypes: MIT has the hacker engineers. Stanford has the laid-back, social folks. Harvard has the prestigious leaders of the world. Berkeley has the activist hippies. Caltech has the hardcore science nerds. But how well do these ... [Read more...]

Introduction to Latent Dirichlet Allocation

August 21, 2011 | Edwin Chen

Introduction Suppose you have the following set of sentences: I like to eat broccoli and bananas. I ate a banana and spinach smoothie for breakfast. Chinchillas and kittens are cute. My sister adopted a kitten yesterday. Look at this cute hamster munching on a piece of broccoli. What is latent ... [Read more...]

Topic Modeling the Sarah Palin Emails

June 27, 2011 | Edwin Chen

tl;dr Browse through Sarah Palin’s emails, automagically organized by topic, here. LDA-based Email Browser Earlier this month, several thousand emails from Sarah Palin’s time as governor of Alaska were released. The emails weren’t organized in any fashion, though, so to make them easier to browse, I ... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)