Articles by Edwin Chen

Bayesian Confidence Intervals: Obama’s ‘That’-Addition and Informality

May 1, 2011 | Edwin Chen

No “That” Left Behind? I came across a post on Language Log last week giving some evidence that Obama tends to add that to the prepared version of his speeches. For example, in a recent speech at George Washington University, … Continue reading → [Read more...]

Filtering for English Tweets: Unsupervised Language Detection on Twitter

April 30, 2011 | Edwin Chen

(See a demo here.) While working on a Twitter sentiment analysis project, I ran into the problem of needing to filter out all non-English tweets. (Asking the Twitter API for English-only tweets doesn’t seem to work, as it nonetheless returns tweets in Spanish, Portuguese, Dutch, Russian, and a couple ...

[Read more...]

Choosing a Machine Learning Classifier

April 26, 2011 | Edwin Chen

How do you know what machine learning algorithm to choose for your classification problem? Of course, if you really care about accuracy, your best bet is to test out a couple different ones (making sure to try different parameters within each algorithm as well), and select the best one by ... [Read more...]

Kickstarter Data Analysis: Success and Pricing

April 25, 2011 | Edwin Chen

Kickstarter is an online crowdfunding platform for launching creative projects. When starting a new project, project owners specify a deadline and the minimum amount of money they need to raise. They receive the money (less a transaction fee) only if … Continue reading → [Read more...]

A Mathematical Introduction to Least Angle Regression

April 20, 2011 | Edwin Chen

(For a layman’s introduction, see here.) Least Angle Regression (aka LARS) is a model selection method for linear regression (when you’re worried about overfitting or want your model to be easily interpretable). To motivate it, let’s consider some other model selection methods: Forward selection starts with no ... [Read more...]

Introduction to Cointegration and Pairs Trading

April 15, 2011 | Edwin Chen

Introduction Suppose you see two drunks (i.e., two random walks) wandering around. The drunks don’t know each other (they’re independent), so there’s no meaningful relationship between their paths. But suppose instead you have a drunk walking with her dog. This … Continue reading → [Read more...]

Hacker News Analysis

March 13, 2011 | Edwin Chen

I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings. Activity on the Site My first question was: how has activity on the site increased over time? I … Continue reading →

[Read more...]

Piiikaaachuuuuuu vs. KHAAAAAN!

March 13, 2011 | Edwin Chen

This is a fun image I found on Neil Kodner’s blog: But I’ve never actually watched any of the Star Trek movies, so I decided to recreate the graph with Pikachu instead: Here’s a smoothed version to better compare the counts … Continue reading →

[Read more...]

A Kernel Density Approach to Outlier Detection

March 13, 2011 | Edwin Chen

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading → [Read more...]

Eigensheep

March 13, 2011 | Edwin Chen

Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep … Continue reading →

[Read more...]

Counting Clusters

March 13, 2011 | Edwin Chen

Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm … Continue reading →

[Read more...]

Topological Combinatorics and the Evasiveness Conjecture

March 13, 2011 | Edwin Chen

The Kahn, Saks, and Sturtevant approach to the Evasiveness Conjecture (see the original paper here) is an epic application of pure mathematics to computer science. I’ll give an overview of the approach here, and probably try to add some more information on the problem in other posts. tl;dr ... [Read more...]

Netflix Prize Summary: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

March 13, 2011 | Edwin Chen

(Way back when, I went through all the Netflix prize papers. I’m now (very slowly) trying to clean up my notes and put them online. Eventually, I hope to have a more integrated tutorial, but here’s a rough draft for now.) This is a summary of Bell and ... [Read more...]

Layman’s Introduction to Measure Theory

March 13, 2011 | Edwin Chen

Measure theory studies ways of generalizing the notions of length/area/volume. Even in 2 dimensions, it might not be clear how to measure the area of the following fairly tame shape: much less the “area” of even weirder shapes in higher dimensions or different spaces entirely. For example, suppose you ... [Read more...]

Netflix Prize Summary: Factorization Meets the Neighborhood

March 13, 2011 | Edwin Chen

Layman’s Introduction to Random Forests

March 13, 2011 | Edwin Chen

Suppose you’re very indecisive, so whenever you want to watch a movie, you ask your friend Willow if she thinks you’ll like it. In order to answer, Willow first needs to figure out what movies you like, so you give her a bunch of movies and tell her ... [Read more...]

Prime Numbers and the Riemann Zeta Function

March 13, 2011 | Edwin Chen

Lots of people know that the Riemann Hypothesis has something to do with prime numbers, but most introductions fail to say what or why. I’ll try to give one angle of explanation. Layman’s Terms Suppose you have a bunch of friends, each with an instrument that plays at ... [Read more...]

Item-to-Item Collaborative Filtering with Amazon’s Recommendation System

February 14, 2011 | Edwin Chen

Introduction In making its product recommendations, Amazon makes heavy use of an item-to-item collaborative filtering approach. This essentially means that for each item X, Amazon builds a neighborhood of related items S(X); whenever you buy/look at an item, Amazon then recommends you items from that item’s neighborhood. ... [Read more...]

« 1 2

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by Edwin Chen

Bayesian Confidence Intervals: Obama’s ‘That’-Addition and Informality

Filtering for English Tweets: Unsupervised Language Detection on Twitter

Choosing a Machine Learning Classifier

Kickstarter Data Analysis: Success and Pricing

A Mathematical Introduction to Least Angle Regression

Introduction to Cointegration and Pairs Trading

Hacker News Analysis

Piiikaaachuuuuuu vs. KHAAAAAN!

A Kernel Density Approach to Outlier Detection

Eigensheep

Counting Clusters

Topological Combinatorics and the Evasiveness Conjecture

Netflix Prize Summary: Scalable Collaborative Filtering with Jointly Derived Neighborhood Interpolation Weights

Layman’s Introduction to Measure Theory

Netflix Prize Summary: Factorization Meets the Neighborhood

Layman’s Introduction to Random Forests

Prime Numbers and the Riemann Zeta Function

Item-to-Item Collaborative Filtering with Amazon’s Recommendation System

Articles by Edwin Chen

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)