Sentiment analysis finds trouble in the Enron emails

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The Enron email dataset, collected during the FERC investigation of the Enron financial scandal, represents the largest publicly available set of emails. This makes theman ideal testbed for sentiment analysis algorithms. Ikanow's Andrew Strite used the open-source Infinit.e framework and a Hadoop cluster to generate sentiment scores for all of the Enron emails, and then used R to manipulate and analyze the resulting data. Here's a visualization of just a few of the email accounts: the red marks flag emails where the sender's sentiment suddenly turned sharply negative (and would therefore be a good place to start looking for evidence):

Enron email analysis
Andrew used the rjson package to interface with the Ikanow REST API, the plyr package to restructure the incoming data, and the ggplot2 package to visualize the results. In a subsequent analysis he also used the zoo package to interpolate and analyze time series of sentiment scores, which you can read about in the full blog post below.

Ikanow blog: Making the most of sentiment scores using Ikanow and R

To leave a comment for the author, please follow the link and comment on their blog: Revolutions. offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)