Blog Archives

Edge Prediction in a Social Graph: My Solution to Facebook’s User Recommendation Contest on Kaggle

July 31, 2012
By
Edge Prediction in a Social Graph: My Solution to Facebook’s User Recommendation Contest on Kaggle

A couple weeks ago, Facebook launched a link prediction contest on Kaggle, with the goal of recommending missing edges in a social graph. I love investigating social networks, so I dug around a little, and since I did well enough to score one of the coveted prizes, I’ll share my approach here.(For some background, the contest provided...

Read more »

Soda vs. Pop with Twitter

July 6, 2012
By
Soda vs. Pop with Twitter

One of the great things about Twitter is that it’s a global conversation anyone can join anytime. Eavesdropping on the world, what what!Of course, it gets even better when you can mine all this chatter to study the way humans live and interact.For example, how do people in New York City differ from those in Silicon Valley? We...

Read more »

Quick Introduction to ggplot2

January 17, 2012
By
Quick Introduction to ggplot2

For a much better looking version of this post (where code is actually readable!), see this Github repository, which also contains some of the example datasets I use and a literate programming version of this tutorial. Introduction This is a bare-bones introduction to ggplot2, a visualization package in R. It assumes no knowledge of R

Read more »

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

September 7, 2011
By
Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

tl;dr See this movie visualization for a case study on how a post propagates through Quora. How does information spread through a network? Much of Quora’s appeal, after all, lies in its social graph — and when you’ve got a network of users, all broadcasting their activities to their neighbors, information can cascade in multiple

Read more »

Tweets vs. Likes: What gets shared on Twitter vs. Facebook?

July 28, 2011
By
Tweets vs. Likes: What gets shared on Twitter vs. Facebook?

It always strikes me as curious that some posts get a lot of love on Twitter, while others get many more shares on Facebook: What accounts for this difference? Some of it is surely site-dependent: maybe one blogger has a Facebook page but not a Twitter account, while another has these roles reversed. But even

Read more »

Topic Modeling the Sarah Palin Emails

June 27, 2011
By
Topic Modeling the Sarah Palin Emails

tl;dr Browse through Sarah Palin’s emails, automagically organized by topic, here. LDA-based Email Browser Earlier this month, several thousand emails from Sarah Palin’s time as governor of Alaska were released. The emails weren’t organized in any fashion, though, so to make them easier to browse, I did some topic modeling (in particular, using latent Dirichlet

Read more »

Bayesian Confidence Intervals: Obama’s ‘That’-Addition and Informality

May 1, 2011
By
Bayesian Confidence Intervals: Obama’s ‘That’-Addition and Informality

No “That” Left Behind? I came across a post on Language Log last week giving some evidence that Obama tends to add that to the prepared version of his speeches. For example, in a recent speech at George Washington University, … Continue reading →

Read more »

Kickstarter Data Analysis: Success and Pricing

April 25, 2011
By
Kickstarter Data Analysis: Success and Pricing

Kickstarter is an online crowdfunding platform for launching creative projects. When starting a new project, project owners specify a deadline and the minimum amount of money they need to raise. They receive the money (less a transaction fee) only if … Continue reading →

Read more »

Introduction to Cointegration and Pairs Trading

April 15, 2011
By
Introduction to Cointegration and Pairs Trading

Introduction Suppose you see two drunks (i.e., two random walks) wandering around. The drunks don’t know each other (they’re independent), so there’s no meaningful relationship between their paths. But suppose instead you have a drunk walking with her dog. This … Continue reading →

Read more »

Hacker News Analysis

March 13, 2011
By
Hacker News Analysis

I was playing around with the Hacker News database Ronnie Roller made (thanks!), so I thought I’d post some of my findings. Activity on the Site My first question was: how has activity on the site increased over time? I … Continue reading →

Read more »