Exploring Mangalyaan tweets with R

[This article was first published on Notes of a Dabbler » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Mangalyaan is the spacecraft of Indian Space Research Orgnization’s Mars Orbiter Mission that entered the orbit of Mars last week. There were several tweets in Twitter with hashtag #Mangalyaan about it last week. I wanted to use R to explore those tweets. Tiger Analytics had done an interesting post on this topic last year when Mangalyaan launched. I found their analysis to infer topics particularly interesting. I do hope they repeat their analysis with the latest tweets. My goals and methods of analysis here are much more basic. I wanted to do the following:

  • Extract tweets containing the hashtag #Mangalyaan and create a word cloud
  • Attempt to find some topics/groupings from tweets
    • Try out R topicmodels package to infer toics
    • Find groupings using hierarchical clustering
    • Find groupings using graph community detection algorithms

The full code and explanation is in the following location. I was able to extract about 1000 tweets spanning 4 days from Sep 23, 2014 to Sep 26, 2014 and used it for the analysis below. All the analysis below should be viewed in the context that it is based on a small sample size. The word cloud of the frequent terms in the tweets is:
twWordcloud

Next, I used R package topic Models with number of topics set to 5 (no particular reason) and got the following result for top 10 words in each topictopics
I had done only basic preprocessing and ran the model with default parameters. Better preprocessing and model parameter tuning might give better results.

Applying hierarchical clustering on frequent terms gives the following grouping:hierClust

I found that igraph package has some easy to use functions for community detection and plotting. Here the co-occurence of words across tweets is used to construct a graph and the community detection algorithm is applied to that graph. These are plotted both as a dendrogram and a graph plot.
communityDendPlot
communityGraphPlot

In summary, this was a fun exercise. I got to learn a bit of the following R packages: twitteR, topicModels, igraph.


To leave a comment for the author, please follow the link and comment on their blog: Notes of a Dabbler » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)