Mangalyaan is the spacecraft of Indian Space Research Orgnization’s Mars Orbiter Mission that entered the orbit of Mars last week. There were several tweets in Twitter with hashtag #Mangalyaan about it last week. I wanted to use R to explore those tweets. Tiger Analytics had done an interesting post on this topic last year when Mangalyaan launched. I found their analysis to infer topics particularly interesting. I do hope they repeat their analysis with the latest tweets. My goals and methods of analysis here are much more basic. I wanted to do the following:
- Extract tweets containing the hashtag #Mangalyaan and create a word cloud
- Attempt to find some topics/groupings from tweets
- Try out R topicmodels package to infer toics
- Find groupings using hierarchical clustering
- Find groupings using graph community detection algorithms
The full code and explanation is in the following location. I was able to extract about 1000 tweets spanning 4 days from Sep 23, 2014 to Sep 26, 2014 and used it for the analysis below. All the analysis below should be viewed in the context that it is based on a small sample size. The word cloud of the frequent terms in the tweets is:
Next, I used R package topic Models with number of topics set to 5 (no particular reason) and got the following result for top 10 words in each topic
I had done only basic preprocessing and ran the model with default parameters. Better preprocessing and model parameter tuning might give better results.
I found that igraph package has some easy to use functions for community detection and plotting. Here the co-occurence of words across tweets is used to construct a graph and the community detection algorithm is applied to that graph. These are plotted both as a dendrogram and a graph plot.