Articles by mjbommar

A quick look at #march11 / #saudi tweets

March 12, 2011 | mjbommar

Well, so much for that #march11 #Saudi day of rage. Whether it was really the "tempest in a teacup" that Prince Al-Waleed suggested on CNBC (video below, transcript here) or not, the oil complex and Saudi markets seem to have shrugged … Continue reading → [Read more...]

Dataset: Wisconsin Union Protester Tweets #wiunion

February 21, 2011 | mjbommar

I’ve been playing with Twitter data over the last week, archiving Algerian, Egyptian, Iranian, and Chinese tweets. I thought I’d bring the story a little closer to home this time by archiving tweets from Wisconsin Union protesters on the … Continue reading → [Read more...]

Tracking the Frequency of Twitter Hashtags with R

February 21, 2011 | mjbommar

I’ve posted three examples of Twitter hashtags datasets in the last week: one on China, one on Iran, and one on Algeria. In order to build these datasets, I needed to obtain older tweets; this is slightly more difficult than … Continue reading → [Read more...]

Dataset: Tweets from the Chinese Protests #cn220

February 20, 2011 | mjbommar

Earlier this week, I posted a ~100k tweet dataset on the #25bahman protests in Iran. The corresponding figure of frequencies showed a strong presence on Twitter, with over 500 tweets per 5 minute period at peak. You can download the … Continue reading → [Read more...]

R Bloggers: The Site I Wish Existed in 2007

February 19, 2011 | mjbommar

My first experience with R was in 2007 as a sophomore in undergrad. As part of a larger project on pricing day-ahead electricity futures, I wanted to cluster locational marginal price (LMP) data from the ISO-NE. Something like k-means is easy … Continue reading → [Read more...]

Pre-processing text: R/tm vs. python/NLTK

February 16, 2011 | mjbommar

Let’s say that you want to take a set of documents and apply a computational linguistic technique. If your method is based on the bag-of-words model, you probably need to pre-process these documents first by segmenting, tokenizing, stripping, stopwording, and … Continue reading → [Read more...]

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Articles by mjbommar

A quick look at #march11 / #saudi tweets

Dataset: Wisconsin Union Protester Tweets #wiunion

Tracking the Frequency of Twitter Hashtags with R

Dataset: Tweets from the Chinese Protests #cn220

R Bloggers: The Site I Wish Existed in 2007

Pre-processing text: R/tm vs. python/NLTK

Articles by mjbommar

Never miss an update! Subscribe to R-bloggers to receive e-mails with the latest R posts. (You will not see this message again.)

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)