Detour in taste wordclouds
[This article was first published on Wiekvoet, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
I read Mining Twitter for consumer attitudes towards hotels in my feed of R-bloggers. That reminded me that I intended to look at generating wordclouds for salt and MSG at some point. Salt, or sodium is linked to hypertension, which is linked to some diseases http://en.wikipedia.org/wiki/Complications_of_hypertension. It is a topic within governments and health organizations, but I have the feeling it is not so much an issue in the public. MSG, or mono sodium glutamate, is not an issue for the governments of health organisations, but has a bad name and is for some linked to the chinese restaurant syndrom. Luckily there was an nice post to follow: Generating Twitter Wordclouds in R.Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Salt
Neither @Salt nor #Salt are good when interested in salt taste. Hence the search is for #sodium
sodium.tweets <- searchTwitter('#sodium',n=1500)
sodium.texts <- laply(sodium.tweets, function(x) x$getText())
head(sodium.texts)
[1] “#Citric Acid #Sodium Bicarbonate http://t.co/QgJxSlGT HealthAid Vitamin C 1000mg – Effervescent (Blackcurrant Flavour) – 20 Tablets”
[2] “I dnt understand metro I can go on Facebook an Twitter but I can’t call or text anybody #sodium”
[3] “Get the facts on #sodium:http://t.co/Djc9rTEl #BCHC @TheHSF”
[4] “#Sodium: How to tame your salt habit now? http://t.co/eFTl8yI1”
[5] “#lol #funny #insta #instafunny #haha #smile #meme #chemistry #joke #sodium http://t.co/pX404RhQ”
[6] “@Astroboii07 #sodium. Haha. Tas bisaya daw. i-sudyum. Hahaha. @andiedote @krizhsexy @mjpatingo #building”
At this point I found the blog twitter to wordcloud, so I restarted and used those functions. The original is from Using Text Mining to Find Out What @RDataMining Tweets are About. There was a small bit of editing. Require(tm) and require(wordcloud) within the functions did not work, so I called on the libraries directly. The clouds had some links in them, shown as ‘httpt’ with some more text added (link to a chemistry joke) a function to remove those is added too.
library(tm)
library(wordcloud)
RemoveAtPeople <- function(tweet) {
gsub(“@\\w+”, “”, tweet)}
RemoveHTTP <- function(tweet) {
gsub(“http[[:alnum:][:punct:]]+”, “”, tweet)
}
generateCorpus= function(df,my.stopwords=c()){
#The following is cribbed and seems to do what it says on the can
tw.corpus= Corpus(VectorSource(df))
# remove punctuation
tw.corpus = tm_map(tw.corpus, removePunctuation)
#normalise case
tw.corpus = tm_map(tw.corpus, tolower)
# remove stopwords
tw.corpus = tm_map(tw.corpus, removeWords, stopwords(‘english’))
tw.corpus = tm_map(tw.corpus, removeWords, my.stopwords)
tw.corpus
}
wordcloud.generate=function(corpus,min.freq=3){
doc.m = TermDocumentMatrix(corpus, control = list(minWordLength = 1))
dm = as.matrix(doc.m)
# calculate the frequency of words
v = sort(rowSums(dm), decreasing=TRUE)
d = data.frame(word=names(v), freq=v)
#Generate the wordcloud
wc=wordcloud(d$word, d$freq, min.freq=min.freq)
wc
}
tweets.grabber=function(searchTerm,num=500){
rdmTweets = searchTwitter(searchTerm, n=num,.encoding=’UTF-8′)
tw.df=twListToDF(rdmTweets)
tweets <- as.vector(sapply(tw.df$text, RemoveAtPeople))
as.vector(sapply(tweets,RemoveHTTP))
}
tweets=tweets.grabber(‘sodium’,num=500)
tweets <- tweets[-308] # tweet in wrong locale
wordcloud.generate(generateCorpus(tweets,’sodium’),3)
The ugly line which removed tweed 308 is because this is in the wrong locale. It gave an error. This is an error which is not simple to resolve, so I removed the offending tweet: R tm package invalid input in ‘utf8towcs’
Error in FUN(X[[308L]], …) :
invalid input ‘That was too much sodium
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
To leave a comment for the author, please follow the link and comment on their blog: Wiekvoet.
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.