Use R to connect to twitter and create a wordcloud of your tweets

Recently I wanted to create a wordcloud of my tweets and do further analysis. In this post I am going to show you how to connect to twitter in R and how to make a wordcloud from your tweets. To follow this tutorial, you need a Twitter account.

First steps in R

Install required libraries twitteR and wordcloud and load them.

?View Code RSPLUS
1
2
3
install.packages(c("wordcloud", "twitteR"))
library(twitteR)
library(wordcloud)

Create a twitter app

To be able to authenticate your API requests with the R package twitteR you need to authenticate yourself. To have an endpoint for that, you need to create a Twitter App at https://apps.twitter.com/. Click “Create New App” and fill the required fields with your values.

  • Name: choose a name for your app, unfortunately it has to be unique. Most combinations of R and Twitter I could think of were already taken, so I just took veRenaTweeteR ?
  • Description: Some description.
  • Website: They want you to provide a website URL e.g. where your app can be downloaded. Since I don’t plan to “publish” my app in anyway I just put my blog address.
  • Callback URL: You have to put http://127.0.0.1:1410 to be redirected after authentication.
Here you set everything for your app.
Here you set everything for your app.

When you successfully created your app, go to Keys and Access Tokens. There you find consumer key and consumer secret that you need to authenticate in R.

Here you get the consumer key and the consumer secret.
Here you get the consumer key and the consumer secret.

Authenticating and first steps with twitteR

Save the keys from your Twitter App.

?View Code RSPLUS
1
2
twitter_key<-"your_twitter_key"
twitter_secret<-"your_twitter_secret"
?View Code RSPLUS
1
oauth<-setup_twitter_oauth(twitter_key, twitter_secret)

After this, a browser will pop open which will ask you to login with your Twitter account (unless you are already logged in) and ask you to give permissions to yourAppName. When you correctly set the callback URL, the following text will appear:

This message is shown in the browser after successful authentication.
This message is shown in the browser after successful authentication.

With the following command we get the 100 newest tweets of user “ExpectAPatronum” (which is me), but you can do it for other users as well. The second line will display the structure of the newest tweet.

?View Code RSPLUS
1
2
myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])

A tweet contains lots of information (from statusSource we can even tell I sent it using the iPhone app!).

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
Reference class 'status' [package "twitteR"] with 17 fields
 $ text         : chr "Don't agree with everything but still funny! https://t.co/2bMYBDkfGY"
 $ favorited    : logi FALSE
 $ favoriteCount: num 0
 $ replyToSN    : chr(0) 
 $ created      : POSIXct[1:1], format: "2016-01-18 07:21:31"
 $ truncated    : logi FALSE
 $ replyToSID   : chr(0) 
 $ id           : chr "688984546289790976"
 $ replyToUID   : chr(0) 
 $ statusSource : chr "<a href=\"http://twitter.com/download/iphone\" rel=\"nofollow\">Twitter for iPhone</a>"
 $ screenName   : chr "ExpectAPatronum"
 $ retweetCount : num 0
 $ isRetweet    : logi FALSE
 $ retweeted    : logi FALSE
 $ longitude    : chr(0) 
 $ latitude     : chr(0) 
 $ urls         :'data.frame':	1 obs. of  5 variables:
  ..$ url         : chr "https://t.co/2bMYBDkfGY"
  ..$ expanded_url: chr "https://twitter.com/jennybryan/status/688866722980364289"
  ..$ display_url : chr "twitter.com/jennybryan/sta…""| __truncated__
  ..$ start_index : num 45
  ..$ stop_index  : num 68
 and 53 methods, of which 39 are  possibly relevant:
   getCreated, getFavoriteCount, getFavorited, getId, getIsRetweet, getLatitude,
   getLongitude, getReplyToSID, getReplyToSN, getReplyToUID, getRetweetCount,
   getRetweeted, getRetweeters, getRetweets, getScreenName, getStatusSource, getText,
   getTruncated, getUrls, initialize, setCreated, setFavoriteCount, setFavorited, setId,
   setIsRetweet, setLatitude, setLongitude, setReplyToSID, setReplyToSN, setReplyToUID,
   setRetweetCount, setRetweeted, setScreenName, setStatusSource, setText, setTruncated,
   setUrls, toDataFrame, toDataFrame#twitterObj

Creating the wordcloud

With the following wordcloud I created the first wordcloud:

?View Code RSPLUS
1
2
3
4
5
6
set.seed(1234) # to always get the same wordcloud and for better reproducibility
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text})) # to extract only the text of each status object
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)
clean_words<-words[-grep("http|@|#|ü|ä|ö", words)] # remove urls, usernames, hashtags and umlauts (the latter can not be displayed by all fonts)
wordcloud(clean_words, min.freq=2)
Without any specific settings.
Without any specific settings.

Making it look nicer

Since I didn’t like the default font and also not the ones suggested in the example section of the package, I started to look for other possible fonts. From the help I found out that everything can be passed as parameter vfont which is also accepted by the method text {graphics} because this parameter will be passed on to this method. This method accepts Hershey fonts (which contains 8 font families with different faces like bold, italic, …).

Playing around with that a little I generated a few more wordclouds.

?View Code RSPLUS
1
2
3
wordcloud(clean_words, min.freq=2, vfont=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("gothic italian", "plain"))
Font serif (plain).
Font serif (plain).
Font script (plain).
Font script (plain).
Font gothic italian (plain).
Font gothic italian (plain).

One other important issue for a nice wordcloud is definitely also font color. wordcloud uses the package RColorBrewer for that (which is automatically installed with wordcloud).

The package RColorBrewer provides several palettes of colors that look nice together. I chose the palette “Pastel1” with 7 colors (minimum is 3, maximum depends on the palette). Of course you can use par to change other settings of the plot.

?View Code RSPLUS
1
2
3
pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)
Font script (plain), gray background and color palette Pastel1.
Font script (plain), gray background and color palette Pastel1.

Other settings

As already seen, you can change the font (vfont) and the color (colors) of the wordcloud. There are a lot more settings in wordcloud:

  • words
  • freq
  • scale (=4,.5): range of the size of the words
  • min.freq (=3): the minimum frequency of a word to be included. I always set it to at least 2.
  • max.words (=Inf): maximum number of words in the wordcloud
  • random.order (=TRUE): otherwise words are plotted in decreasing frequency
  • random.color (=FALSE)
  • rot.per (=.1): how many words are 90 degree rotated
  • colors (= “black”)
  • ordered.colors (= FALSE)
  • use.r.layout (=FALSE)
  • fixed.asp (=TRUE)
  • …: any parameter that can be passed to text (e.g. vfont)

Source code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
library(wordcloud)
library(twitteR)
 
install.packages("extrafont")
library(extrafont)
font_import()
 
twitter_key<-"your_key"
twitter_secret<-"your_secret"
 
oauth<-setup_twitter_oauth(twitter_key, twitter_secret)
myTweets<-userTimeline("ExpectAPatronum", n=100)
str(myTweets[[1]])
 
tweetTexts<-unlist(lapply(myTweets, function(t) { t$text}))
 
#### wordcloud
 
set.seed(1234)
words<-unlist(strsplit(tweetTexts, " "))
words<-tolower(words)
 
length(grep("http", words))
length(grep("@", words))
length(grep("#", words))
 
clean_words<-words[-grep("http|@|#|ü|ä|ö", words)]
wordcloud(clean_words, min.freq=2)
 
#### playing with the settings 
 
wordcloud(clean_words, min.freq=2, vfont=c("serif", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"))
wordcloud(clean_words, min.freq=2, vfont=c("gothic italian", "plain"))
 
 
pal<-brewer.pal(7, "Pastel1")
par(bg="darkgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)
 
#### feature image
 
pal<-brewer.pal(7, "Dark2")
par(bg="lightgray")
wordcloud(clean_words, min.freq=2, vfont=c("script", "plain"), colors=pal)

The post Use R to connect to twitter and create a wordcloud of your tweets appeared first on verenahaunschmid.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)