Blog Archives

Introduction to Conditional Random Fields

January 2, 2012
By
Introduction to Conditional Random Fields

Imagine you have a sequence of snapshots from a day in Justin Bieber’s life, and you want to label each image with the activity it represents (eating, sleeping, driving, etc.). How can you do this? One way is to ignore the sequential nature of the snapshots, and build a per-image classifier. For example, given a month’s worth of...

Read more »

Winning the Netflix Prize: A Summary

October 23, 2011
By
Winning the Netflix Prize: A Summary

How was the Netflix Prize won? I went through a lot of the Netflix Prize papers a couple years ago, so I’ll try to give an overview of the techniques that went into the winning solution here. Normalization of Global Effects Suppose Alice rates Inception 4 stars. We can think of this rating as composed of...

Read more »

Stuff Harvard People Like

September 28, 2011
By

What types of students go to which schools? There are, of course, the classic stereotypes: MIT has the hacker engineers. Stanford has the laid-back, social folks. Harvard has the prestigious leaders of the world. Berkeley has the activist hippies. Caltech has the hardcore science nerds. But how well do these perceptions match reality?...

Read more »

Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

September 7, 2011
By
Information Transmission in a Social Network: Dissecting the Spread of a Quora Post

tl;dr See this movie visualization for a case study on how a post propagates through Quora. How does information spread through a network? Much of Quora’s appeal, after all, lies in its social graph — and when you’ve got a network of users, all broadcasting their activities to their neighbors, information can cascade in multiple

Read more »

Introduction to Latent Dirichlet Allocation

August 21, 2011
By
Introduction to Latent Dirichlet Allocation

Introduction Suppose you have the following set of sentences: I like to eat broccoli and bananas. I ate a banana and spinach smoothie for breakfast. Chinchillas and kittens are cute. My sister adopted a kitten yesterday. Look at this cute hamster munching on a piece of broccoli. What is latent Dirichlet allocation?...

Read more »

Tweets vs. Likes: What gets shared on Twitter vs. Facebook?

July 28, 2011
By
Tweets vs. Likes: What gets shared on Twitter vs. Facebook?

It always strikes me as curious that some posts get a lot of love on Twitter, while others get many more shares on Facebook: What accounts for this difference? Some of it is surely site-dependent: maybe one blogger has a Facebook page but not a Twitter account, while another has these roles reversed. But even

Read more »

Introduction to Restricted Boltzmann Machines

July 17, 2011
By
Introduction to Restricted Boltzmann Machines

Suppose you ask a bunch of users to rate a set of movies on a 0-100 scale. In classical factor analysis, you could then try to explain each movie and user in terms of a set of latent factors. For example, movies like Star Wars and Lord of the Rings might have strong associations with a latent science...

Read more »

Topic Modeling the Sarah Palin Emails

June 27, 2011
By
Topic Modeling the Sarah Palin Emails

tl;dr Browse through Sarah Palin’s emails, automagically organized by topic, here. LDA-based Email Browser Earlier this month, several thousand emails from Sarah Palin’s time as governor of Alaska were released. The emails weren’t organized in any fashion, though, so to make them easier to browse, I did some topic modeling (in particular, using latent Dirichlet

Read more »

Bayesian Confidence Intervals: Obama’s ‘That’-Addition and Informality

May 1, 2011
By
Bayesian Confidence Intervals: Obama’s ‘That’-Addition and Informality

No “That” Left Behind? I came across a post on Language Log last week giving some evidence that Obama tends to add that to the prepared version of his speeches. For example, in a recent speech at George Washington University, … Continue reading →

Read more »

Filtering for English Tweets: Unsupervised Language Detection on Twitter

April 30, 2011
By
Filtering for English Tweets: Unsupervised Language Detection on Twitter

(See a demo here.) While working on a Twitter sentiment analysis project, I ran into the problem of needing to filter out all non-English tweets. (Asking the Twitter API for English-only tweets doesn’t seem to work, as it nonetheless returns tweets in Spanish, Portuguese, Dutch, Russian, and a couple other languages.) Since I didn’t have any...

Read more »

Sponsors

Mango solutions



RStudio homepage



Zero Inflated Models and Generalized Linear Mixed Models with R

Quantide: statistical consulting and training

datasociety

http://www.eoda.de





ODSC

ODSC

CRC R books series





Six Sigma Online Training









Contact us if you wish to help support R-bloggers, and place your banner here.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)