Social Media Monitoring tools in R with just a few lines

February 21, 2013
By

(This article was first published on My Data Atelier » R, and kindly contributed to R-bloggers)

twitter-magnifying-glass

Social Media Analysis has become some kind of new obsession in Marketing. Every company wants to engage existing customers or attract new ones through this communication channel. Therefore, they hire designers, editors, community managers, etc. However, when it comes to measurement, when the impact of all the resources spent on social media communication has to be analyzed, they either do it extremely manually (i.e. manual counts, etc) or they spend an important amount of money in “pre-built” solutions that might a) not meet their needs exactly, b) be too difficult to master, c) need to constantly contact Customer Service whenever a problem is found or a change must be done, etc.

In my opinion, there is a misunderstanding of the importance of having the right tools (and persons) to analyze these results underlying this issue; although it might seem a field of exclusive qualitative focus, the work of a Data scientist is, undoubtely, as important as the above mentioned ones to succeed in Social Media Marketing.

Only using the right tools with solid sustainable analysis we can gather good insights from Social Media Marketing. If not, you’re being “trendy” but you are probably losing money and time.

The objective in this ocasion is to show how easy it can be to build your own Social Media Tool and at the same time give a very brief introduction to four very useful packages for the topic in question that can help you with your own developments:

Shiny: A package created by RStudio (http://shiny.rstudio.org/), to build Web applications very easily. In this case, you will see the code to operate locally. Here you may be able to find more info on how to run it on your own or a hosted server. (I have not tried this option yet)

I would encourage you to go a step beyond with this package, as only a few features are presented here. As you will see, “front-end” possibilities are barely exploited in this example.

twitteR: A very powerful package for Twitter Monitoring. Simple, easy and very effective.

tm: As you might have already imagined, tm stands for “Text Mining”. Apart from having text mining tools, it also provides very useful functions to pre-process texts

wordcloud: Package used to do Wordcloud plots.

Below, what you all expected… the code!! If you copy/paste it, you will see that an error message saying “Error: You must enter a query” appears twice (it comes from the twitteR package). My apologies, I could not manage to hide it (I have not found any error handler that could do it). If anyone has the solution, I will appreciate, if he/she could share it. I will include it and cite him/her in the post.

“Shiny” package requires two separate Scripts, named UI and server. They must be placed in the same folder. Once finished, to run your web application you should enter

library(shiny)
runApp(“the location of the desired folder”)

UI.R


library(shiny)
shinyUI(pageWithSidebar(

# Application title
headerPanel(“Tweets hunter”),

sidebarPanel( textInput(“term”, “Enter a term”, “”),
numericInput(“cant”, “Select a number of tweets”,1,0,200),
radioButtons(“lang”,”Select the language”,c(
“English”=”en”,
“Castellano”=”es”,
“Deutsch”=”de”)),
submitButton(text=”Run”)),

mainPanel(
h4(“Last 5 Tweets”),
tableOutput(“table”),
plotOutput(“wordcl”))
))

server.R


library(shiny)
library(twitteR)
library(wordcloud)
library(tm)

shinyServer(function (input, output) {

rawData <- reactive(function(){
tweets <- searchTwitter(input$term, n=input$cant,lang=input$lang)
twListToDF(tweets)
})

output$table <- reactiveTable(function () {
head(rawData()[1],n=5)
})
output$wordcl<- reactivePlot(function(){
tw.text<-enc2native(rawData()$text)
tw.text <- tolower(tw.text)
tw.text <- removeWords(tw.text,c(stopwords(input$lang),”rt”))
tw.text <- removePunctuation(tw.text,TRUE)
tw.text <-unlist(strsplit(tw.text,” “))

word<- sort(table(tw.text),TRUE)

wordc<-head(word,n=15)

wordcloud(names(wordc),wordc,random.color=TRUE,colors=rainbow(10),scale=c(15,2))
})
})

As you can see in the scripts, shiny package works with an input/output logic. In order to build your own applications, you will need to have a very clear idea of what should be given by the user (input) and what should be showed out (output) and in which file (ui or server) you should place each part of the process. The steps below might help

1) What: User enters parameters. Where: User Input Menu. File: ui.R
2) What: Parameters values are taken by the function and processed. Where: “R Engine”. File: server.R
3) What: Outputs are returned. Where: “R engine”. File: server.R
4) What: Outputs are shown. Where: Webpage (wherever you decided to put it). File: UI.R

Before we go on to analyze in more detail all the code presented, this is how the application looks like:

tweets hunter begin

From the user interface perspective, it is quite simple; we just type in a word select the amount of tweets we would like to extract and finally select the language. Below you will see a screenshot of the result of entering the term “music”, selecting 200 tweets and language English:

tweets hunter example

Let´s take a look to the code following the 4 steps mentioned above:

1) User enters parameters. UI.R


library(shiny)
shinyUI(pageWithSidebar(

# Application title
headerPanel(“Tweets hunter”),

sidebarPanel( textInput(“term”, “Enter a term”, “”),
numericInput(“cant”, “Select a number of tweets”,1,0,200),
radioButtons(“lang”,”Select the language”,c(
“English”=”en”,
“Castellano”=”es”,
“Deutsch”=”de”)),
submitButton(text=”Run”)),

The Menu logic is extremely easy to understand. Each of the widgets to enter parameters has the id as first argument. To call it in further actions, you will write input$(id). It works as any other variable. In the radioButtons, the name of the button is first entered and then the value that the radioButton variable will receive if that option is chosen.

The only thing to point out specially is the submitButton() function. Unless you include it, the script will process and output the results every time you change the parameters. The submitButton() function is particularly useful if the process that has to take place is expensive or if you need from the user to enter more than one parameter for the whole script to run correctly.

2) Parameters values are taken by the function and processed. server.R


library(shiny)
library(twitteR)
library(wordcloud)
library(tm)

shinyServer(function (input, output) {

rawData <- reactive(function(){
tweets <- searchTwitter(input$term, n=input$cant,lang=input$lang)
twListToDF(tweets)
})

Step 2 is a bit more complex. Firstly, all the necessary libraries (apart from shiny) have to be initialized in server.R. It is always advisable, when possible, to start them all together.

shinyServer(function(input, output) must always be the first line of your code in server.R before working with the “input” variables (the parameters entered by the user). Otherwise, it will not work.

reactive() is a function that indicates that whatever is processed inside that function, it will be done whenever the parameters are changed (if you entered the submitButton() in the UI, whenever you hit it. Otherwise, whenever you change the parameters). It is particularly useful to build the raw Data that you will process afterwards. You can find a very good explanation of it in the documentation

In this case, searchTwitter() uses the information entered by the user (the first argument is the term entered, the second the amount of tweets and the third one the language) and gives its output.

As the object returned by searchTwitter() is a bit difficult to handle, it is advisable to turn it into a Data Frame (twListToDF()) if you want to work, for example, with their texts.

Finally, in this process we got what we needed; the raw data.

3) Outputs are returned. File: server.R

output$table <- reactiveTable(function () {
head(rawData()[1],n=5)
})
output$wordcl<- reactivePlot(function(){
tw.text<-enc2native(rawData()$text)
tw.text <- tolower(tw.text)
tw.text <- removeWords(tw.text,c(stopwords(input$lang),”rt”))
tw.text <- removePunctuation(tw.text,TRUE)
tw.text <-unlist(strsplit(tw.text,” “))

word<- sort(table(tw.text),TRUE)

wordc<-head(word,n=15)

wordcloud(names(wordc),wordc,random.color=TRUE,colors=rainbow(10),scale=c(15,2))
})

reactiveTable() and reactivePlot() are functions to call other functions and return specific “reactive” outputs (in this case, a table and a plot) that will be afterwards displayed in the web interface. It is important to name them as output$(…) because, as it will be explained in Step 4, otherwise you will not be able to display it in the web application.

The reactive Table in this example just returns the first five records of the first column of the data frame and it can be understood by just reading the code.

The reactive Plot is a bit more sophisticated: Firstly, it takes the tweets and changes the encoding (enc2native()). I tried this script in 3 different computers and I had problems with encoding in one of them. That is just to avoid this issue.

Then, it converts everything to lower case. After that, the function removeWords() from the package “tm” is used to delete common words. As you can appreciate in the example, you can input whatever word, list of words, regex, etc. you would like to be removed. In this case, the stopwords from the user-entered language (input$lang) are removed, plus the term “rt”. In order to do a wordcloud (our final objective), this is particularly useful, as we will never would like to have “common words” in it. After that, punctuation is also removed.

Finally, all the tweets are turned into a list of words, then turned into a table (i.e., a frequency table), ordered descendently and the first 15 are chosen to plot.

For the wordcloud function, we enter the labels (the words themselves) of the generated table and the frequencies (the first two arguments in the example). This will determine the size of the words. The last arguments refer to the color order, the palette, and the maximum and minimum size for each word in the plot.

4) Outputs are shown. File: UI.R

mainPanel(
h4(“Last 5 Tweets”),
tableOutput(“table”),
plotOutput(“wordcl”))
))

The concluding part is very simple. You just have to specify where (in this case, the main panel) and what (in this case, the table and the plot) to show. Remember that every object generated with a recative function in the server.R file and declared correctly as output$(…) must now be called as “…”. In this case, the wordcloud was declared as output$wordcl and is called in the UI as plotOutput(“wordcl”).

That´s it! I hope you enjoyed it and sorry for the length :D . As usual, if you have any question, critic or correction, please feel free to write.


To leave a comment for the author, please follow the link and comment on his blog: My Data Atelier » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.