In-depth analysis of Twitter activity and sentiment, with R

[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Astronomer and budding data scientist Julia Silge has been using R for less than a year, but based on the posts using R on her blog has already become very proficient at using R to analyze some interesting data sets. She has posted detailed analyses of water consumption data and health care indicators from the Utah Open Data Catalog, religious affiliation data from the Association of Statisticians of American Religious Bodies, and demographic data from the American Community Survey (that's the same dataset we mentioned on Monday).

In a two-part series, Julia analyzed another interesting dataset: her own archive of 10,000 tweets. (Julia provides all the R code for her analyses, so you can download your own Twitter archive and follow along.) In part one, Julia uses just a few lines of R to import her Twitter archive into R — in fact, that takes just one line of R code:

tweets <- read.csv("./tweets.csv", stringsAsFactors = FALSE)

She then uses the lubridate package to clean up the timestamps, and the ggplot2 package to create some simple charts of her Twitter activity. This chart takes just a few lines of R code and shows her Twitter activity over time categorized by type of tweet (direct tweets, replies, and retweets).

Tweet types
The really interesting part of the analysis comes in part two, where Julia uses the tm package (which provides a number of text mining functions to R) and syuzhet package (which includes the NRC Word-Emotion Association Lexicon algorithm) to analyze the sentiment of her tweets. Categorizing all 10,000 tweets as representing "anger", "fear", "surprise" and other sentiments, and generating a positive and negative sentiment score for each, is as simple as this one line of R code:

mySentiment <- get_nrc_sentiment(tweets$text)

Using those sentiment scores, Julia was easily able to summarize the sentiments expressed in her tweet history:

Sentiment bar chart

and create this time series chart showing her negative and positive sentiment scores over time:

Sentiment time series

If you've been thinking about applying sentiment analysis to some text data, you might find that with R it's easier than you think! Try it using your own Twitter archive by following along with Julia's posts linked below.

data science ish: Ten Thousand Tweets ; Joy to the World, and also Anticipation, Disgust, Surprise...

To leave a comment for the author, please follow the link and comment on their blog: Revolutions.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)