Articles by Edwin Chen

Choosing a Machine Learning Classifier

April 26, 2011 | Edwin Chen

How do you know what machine learning algorithm to choose for your classification problem? Of course, if you really care about accuracy, your best bet is to test out a couple different ones (making sure to try different parameters within each algorithm as well), and select the best one by ... [Read more...]

Kickstarter Data Analysis: Success and Pricing

April 25, 2011 | Edwin Chen

Kickstarter is an online crowdfunding platform for launching creative projects. When starting a new project, project owners specify a deadline and the minimum amount of money they need to raise. They receive the money (less a transaction fee) only if … Continue reading → [Read more...]

A Mathematical Introduction to Least Angle Regression

April 20, 2011 | Edwin Chen

(For a layman’s introduction, see here.) Least Angle Regression (aka LARS) is a model selection method for linear regression (when you’re worried about overfitting or want your model to be easily interpretable). To motivate it, let’s consider some other model selection methods: Forward selection starts with no ... [Read more...]

Introduction to Cointegration and Pairs Trading

April 15, 2011 | Edwin Chen

Introduction Suppose you see two drunks (i.e., two random walks) wandering around. The drunks don’t know each other (they’re independent), so there’s no meaningful relationship between their paths. But suppose instead you have a drunk walking with her dog. This … Continue reading → [Read more...]

Hacker News Analysis

March 13, 2011 | Edwin Chen

I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings. Activity on the Site My first question was: how has activity on the site increased over time? I … Continue reading →
[Read more...]

Piiikaaachuuuuuu vs. KHAAAAAN!

March 13, 2011 | Edwin Chen

This is a fun image I found on Neil Kodner’s blog: But I’ve never actually watched any of the Star Trek movies, so I decided to recreate the graph with Pikachu instead: Here’s a smoothed version to better compare the counts … Continue reading →
[Read more...]

A Kernel Density Approach to Outlier Detection

March 13, 2011 | Edwin Chen

I describe a kernel density approach to outlier detection on small datasets. In particular, my model is the set of prices for a given item that can be found online. Introduction Suppose you’re searching online for the cheapest place to … Continue reading → [Read more...]

Eigensheep

March 13, 2011 | Edwin Chen

Aaron Koblin’s Sheep Market visualization is an awesome use of Mechanical Turk. But it’d be even more awesome if the grid were ordered, so inspired by the use of eigenfaces in facial recognition, I decided to try projecting the sheep … Continue reading →
[Read more...]

Counting Clusters

March 13, 2011 | Edwin Chen

Given a set of numerical datapoints, we often want to know how many clusters the datapoints form. Two practical algorithms for determining the number of clusters are the gap statistic and the prediction strength. Gap Statistic The gap statistic algorithm … Continue reading →
[Read more...]

Layman’s Introduction to Measure Theory

March 13, 2011 | Edwin Chen

Measure theory studies ways of generalizing the notions of length/area/volume. Even in 2 dimensions, it might not be clear how to measure the area of the following fairly tame shape: much less the “area” of even weirder shapes in higher dimensions or different spaces entirely. For example, suppose you ... [Read more...]

Layman’s Introduction to Random Forests

March 13, 2011 | Edwin Chen

Suppose you’re very indecisive, so whenever you want to watch a movie, you ask your friend Willow if she thinks you’ll like it. In order to answer, Willow first needs to figure out what movies you like, so you give her a bunch of movies and tell her ... [Read more...]

Prime Numbers and the Riemann Zeta Function

March 13, 2011 | Edwin Chen

Lots of people know that the Riemann Hypothesis has something to do with prime numbers, but most introductions fail to say what or why. I’ll try to give one angle of explanation. Layman’s Terms Suppose you have a bunch of friends, each with an instrument that plays at ... [Read more...]
1 2

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)