Detour in taste wordclouds

[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I read Mining Twitter for consumer attitudes towards hotels in my feed of R-bloggers. That reminded me that I intended to look at generating wordclouds for salt and MSG at some point. Salt, or sodium is linked to hypertension, which is linked to some diseases http://en.wikipedia.org/wiki/Complications_of_hypertension. It is a topic within governments and health organizations, but I have the feeling it is not so much an issue in the public. MSG, or mono sodium glutamate, is not an issue for the governments of health organisations, but has a bad name and is for some linked to the chinese restaurant syndrom.  Luckily there was an nice post to follow: Generating Twitter Wordclouds in R.
Salt
Neither @Salt nor #Salt are good when interested in salt taste. Hence the search is for #sodium

sodium.tweets <- searchTwitter('#sodium',n=1500)
sodium.texts <- laply(sodium.tweets, function(x) x$getText())
head(sodium.texts)
[1] “#Citric Acid #Sodium Bicarbonate http://t.co/QgJxSlGT HealthAid Vitamin C 1000mg – Effervescent (Blackcurrant Flavour) – 20 Tablets”
[2] “I dnt understand metro I can go on Facebook an Twitter but I can’t call or text anybody #sodium”                                    
[3] “Get the facts on #sodium:http://t.co/Djc9rTEl #BCHC @TheHSF”                                                                        
[4] “#Sodium: How to tame your salt habit now? http://t.co/eFTl8yI1”                                                                     
[5] “#lol #funny #insta #instafunny #haha #smile #meme #chemistry #joke #sodium  http://t.co/pX404RhQ”                                   
[6] “@Astroboii07 #sodium. Haha. Tas bisaya daw. i-sudyum. Hahaha.  @andiedote @krizhsexy @mjpatingo  #building”                         
At this point I found the blog twitter to wordcloud, so I restarted and used those functions. The original is from Using Text Mining to Find Out What @RDataMining Tweets are About. There was a small bit of editing. Require(tm) and require(wordcloud) within the functions did not work, so I called on the libraries directly. The clouds had some links in them, shown as ‘httpt’ with some more text added (link to a chemistry joke) a function to remove those is added too.
library(tm)
library(wordcloud)

RemoveAtPeople <- function(tweet) {
gsub(“@\\w+”, “”, tweet)
}

RemoveHTTP <- function(tweet) {
gsub(“http[[:alnum:][:punct:]]+”, “”, tweet)
}

generateCorpus= function(df,my.stopwords=c()){
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords(‘english’))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
tweets.grabber=function(searchTerm,num=500){
rdmTweets = searchTwitter(searchTerm, n=num,.encoding=’UTF-8′)
tw.df=twListToDF(rdmTweets)
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
as.vector(sapply(tweets,RemoveHTTP))
}



tweets=tweets.grabber(‘sodium’,num=500)
tweets <- tweets[-308] # tweet in wrong locale
wordcloud.generate(generateCorpus(tweets,’sodium’),3)
The ugly line which removed tweed 308 is because this is in the wrong locale. It gave an error. This is an error which is not simple to resolve, so I removed the offending tweet: R tm package invalid input in ‘utf8towcs’
Error in FUN(X[[308L]], …) : 
  invalid input ‘That was too much sodium

To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)