Sentiment analysis finds trouble in the Enron emails

May 24, 2013
By

(This article was first published on Revolutions, and kindly contributed to R-bloggers)

The Enron email dataset, collected during the FERC investigation of the Enron financial scandal, represents the largest publicly available set of emails. This makes theman ideal testbed for sentiment analysis algorithms. Ikanow's Andrew Strite used the open-source Infinit.e framework and a Hadoop cluster to generate sentiment scores for all of the Enron emails, and then used R to manipulate and analyze the resulting data. Here's a visualization of just a few of the email accounts: the red marks flag emails where the sender's sentiment suddenly turned sharply negative (and would therefore be a good place to start looking for evidence):

Enron email analysis
Andrew used the rjson package to interface with the Ikanow REST API, the plyr package to restructure the incoming data, and the ggplot2 package to visualize the results. In a subsequent analysis he also used the zoo package to interpolate and analyze time series of sentiment scores, which you can read about in the full blog post below.

Ikanow blog: Making the most of sentiment scores using Ikanow and R

To leave a comment for the author, please follow the link and comment on his blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.