Characterizing Twitter followers with tidytext

[This article was first published on Shirin's playgRound, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Lately, I have been more and more taken with tidy principles of data analysis. They are elegant and make analyses clearer and easier to comprehend. Following the tidyverse and ggraph, I have been quite intrigued by applying tidy principles to text analysis with Julia Silge and David Robinson’s tidytext.

In this post, I will explore tidytext with an analysis of my Twitter followers’ descriptions to try and learn more about the people who are interested in my tweets, which are mainly about Data Science and Machine Learning.

Resources I found useful for this analysis were http://www.rdatamining.com/docs/twitter-analysis-with-r and http://tidytextmining.com/tidytext.html

Retrieving Twitter data

I am using twitteR to retrieve data from Twitter (I have also tried rtweet but for some reason, my API key, secret and token (that worked with twitteR) resulted in a “failed to authorize” error with rtweet’s functions).

<span class="n">library</span><span class="p">(</span><span class="n">twitteR</span><span class="p">)</span><span class="w">
</span>

Once we have set up our Twitter REST API, we get the necessary information to authenticate our access.

<span class="n">consumerKey</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"INSERT KEY HERE"</span><span class="w">
</span><span class="n">consumerSecret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"INSERT SECRET KEY HERE"</span><span class="w">
</span><span class="n">accessToken</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"INSERT TOKEN HERE"</span><span class="w">
</span><span class="n">accessSecret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"INSERT SECRET TOKEN HERE"</span><span class="w">
</span>
<span class="n">options</span><span class="p">(</span><span class="n">httr_oauth_cache</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">

</span><span class="n">setup_twitter_oauth</span><span class="p">(</span><span class="n">consumer_key</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">consumerKey</span><span class="p">,</span><span class="w"> 
                    </span><span class="n">consumer_secret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">consumerSecret</span><span class="p">,</span><span class="w"> 
                    </span><span class="n">access_token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">accessToken</span><span class="p">,</span><span class="w"> 
                    </span><span class="n">access_secret</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">accessSecret</span><span class="p">)</span><span class="w">
</span>

Now, we can access information from Twitter, like timeline tweets, user timelines, mentions, tweets & retweets, followers, etc.

All the following datasets were retrieved on June 7th 2017, converted to a data frame for tidy analysis and saved for later use:

  • the last 3200 tweets on my timeline
<span class="n">my_name</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">userTimeline</span><span class="p">(</span><span class="s2">"ShirinGlander"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3200</span><span class="p">,</span><span class="w"> </span><span class="n">includeRts</span><span class="o">=</span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span><span class="n">my_name_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">twListToDF</span><span class="p">(</span><span class="n">my_name</span><span class="p">)</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">my_name_df</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"my_name.RData"</span><span class="p">)</span><span class="w">
</span>
  • my last 3200 mentions and retweets
<span class="n">my_mentions</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">mentions</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3200</span><span class="p">)</span><span class="w">
</span><span class="n">my_mentions_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">twListToDF</span><span class="p">(</span><span class="n">my_mentions</span><span class="p">)</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">my_mentions_df</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"my_mentions.RData"</span><span class="p">)</span><span class="w">

</span><span class="n">my_retweets</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">retweetsOfMe</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3200</span><span class="p">)</span><span class="w">
</span><span class="n">my_retweets_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">twListToDF</span><span class="p">(</span><span class="n">my_retweets</span><span class="p">)</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">my_retweets_df</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"my_retweets.RData"</span><span class="p">)</span><span class="w">
</span>
  • the last 3200 tweets to me
<span class="n">tweetstome</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">searchTwitter</span><span class="p">(</span><span class="s2">"@ShirinGlander"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">3200</span><span class="p">)</span><span class="w">
</span><span class="n">tweetstome_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">twListToDF</span><span class="p">(</span><span class="n">tweetstome</span><span class="p">)</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">tweetstome_df</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"tweetstome.RData"</span><span class="p">)</span><span class="w">
</span>
  • my friends and followers
<span class="n">user</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">getUser</span><span class="p">(</span><span class="s2">"ShirinGlander"</span><span class="p">)</span><span class="w">

</span><span class="n">friends</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">user</span><span class="o">$</span><span class="n">getFriends</span><span class="p">()</span><span class="w"> </span><span class="c1"># who I follow
</span><span class="n">friends_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">twListToDF</span><span class="p">(</span><span class="n">friends</span><span class="p">)</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">friends_df</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"my_friends.RData"</span><span class="p">)</span><span class="w">

</span><span class="n">followers</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">user</span><span class="o">$</span><span class="n">getFollowers</span><span class="p">()</span><span class="w"> </span><span class="c1"># my followers
</span><span class="n">followers_df</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">twListToDF</span><span class="p">(</span><span class="n">followers</span><span class="p">)</span><span class="w">
</span><span class="n">save</span><span class="p">(</span><span class="n">followers_df</span><span class="p">,</span><span class="w"> </span><span class="n">file</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"my_followers.RData"</span><span class="p">)</span><span class="w">
</span>

Analyzing friends and followers

In this post, I will have a look at my friends and followers.

<span class="n">load</span><span class="p">(</span><span class="s2">"my_friends.RData"</span><span class="p">)</span><span class="w">
</span><span class="n">load</span><span class="p">(</span><span class="s2">"my_followers.RData"</span><span class="p">)</span><span class="w">
</span>

I am going to use packages from the tidyverse (tidyquant for plotting).

<span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidyquant</span><span class="p">)</span><span class="w">
</span>
  • Number of friends (who I follow on Twitter): 225

  • Number of followers (who follows me on Twitter): 324

  • Number of friends who are also followers: 97

What languages do my followers speak?

One of the columns describing my followers is which language they have set for their Twitter account. Not surprisingly, English is by far the most predominant language of my followers, followed by German, Spanish and French.

<span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">count</span><span class="p">(</span><span class="n">lang</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">droplevels</span><span class="p">()</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">reorder</span><span class="p">(</span><span class="n">lang</span><span class="p">,</span><span class="w"> </span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">)),</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_bar</span><span class="p">(</span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme_tq</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme</span><span class="p">(</span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">angle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">45</span><span class="p">,</span><span class="w"> </span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"language ISO 639-1 code"</span><span class="p">,</span><span class="w">
         </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"number of followers"</span><span class="p">)</span><span class="w">
</span>

Who are my most “influential” followers (i.e. followers with the biggest network)?

I also have information about the number of followers that each of my followers have (2nd degree followers). Most of my followers are followed by up to ~ 1000 people, while only a few have a very large network.

<span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">log2</span><span class="p">(</span><span class="n">followersCount</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_density</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme_tq</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"log2 of number of followers"</span><span class="p">,</span><span class="w">
         </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"density"</span><span class="p">)</span><span class="w">
</span>

How active are my followers (i.e. how often do they tweet)

The followers data frame also tells me how many statuses (i.e. tweets) each of followers have. To make the numbers comparable, I am normalizing them by the number of days that they have had their accounts to calculate the average number of tweets per day.

<span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="n">created</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">today</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2017-06-07"</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">today</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">date</span><span class="p">),</span><span class="w">
         </span><span class="n">statusesCount_pDay</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">statusesCount</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">days</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">log2</span><span class="p">(</span><span class="n">statusesCount_pDay</span><span class="p">)))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_density</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme_tq</span><span class="p">()</span><span class="w">
</span>

Who are my followers with the biggest network and who tweet the most?

<span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="n">created</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">today</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2017-06-07"</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">today</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">date</span><span class="p">),</span><span class="w">
         </span><span class="n">statusesCount_pDay</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">statusesCount</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">days</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">select</span><span class="p">(</span><span class="n">screenName</span><span class="p">,</span><span class="w"> </span><span class="n">followersCount</span><span class="p">,</span><span class="w"> </span><span class="n">statusesCount_pDay</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">followersCount</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">top_n</span><span class="p">(</span><span class="m">10</span><span class="p">)</span><span class="w">
</span>
##         screenName followersCount statusesCount_pDay
## 1        dr_morton         150937           71.35193
## 2    Scientists4EU          66117           17.64389
## 3       dr_morton_          63467           46.57763
## 4   NewScienceWrld          60092           54.65874
## 5     RubenRabines          42286           25.99592
## 6  machinelearnbot          27427          204.67061
## 7  BecomingDataSci          16807           25.24069
## 8       joelgombin           6566           21.24094
## 9    renato_umeton           1998           19.58387
## 10 FranPatogenLoco            311           28.92593
<span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="n">created</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">today</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2017-06-07"</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">today</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">date</span><span class="p">),</span><span class="w">
         </span><span class="n">statusesCount_pDay</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">statusesCount</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">days</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">select</span><span class="p">(</span><span class="n">screenName</span><span class="p">,</span><span class="w"> </span><span class="n">followersCount</span><span class="p">,</span><span class="w"> </span><span class="n">statusesCount_pDay</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">statusesCount_pDay</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">top_n</span><span class="p">(</span><span class="m">10</span><span class="p">)</span><span class="w">
</span>
##         screenName followersCount statusesCount_pDay
## 1  machinelearnbot          27427          204.67061
## 2        dr_morton         150937           71.35193
## 3   NewScienceWrld          60092           54.65874
## 4       dr_morton_          63467           46.57763
## 5  FranPatogenLoco            311           28.92593
## 6     RubenRabines          42286           25.99592
## 7  BecomingDataSci          16807           25.24069
## 8       joelgombin           6566           21.24094
## 9    renato_umeton           1998           19.58387
## 10   Scientists4EU          66117           17.64389

Is there a correlation between number of followers and number of tweets?

Indeed, there seems to be a correlation that users with many followers also tend to tweet more often.

<span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">date</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="n">created</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">today</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">as.Date</span><span class="p">(</span><span class="s2">"2017-06-07"</span><span class="p">,</span><span class="w"> </span><span class="n">format</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"%Y-%m-%d"</span><span class="p">),</span><span class="w">
         </span><span class="n">days</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">as.numeric</span><span class="p">(</span><span class="n">today</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">date</span><span class="p">),</span><span class="w">
         </span><span class="n">statusesCount_pDay</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">statusesCount</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">days</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">followersCount</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">statusesCount_pDay</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">days</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_smooth</span><span class="p">(</span><span class="n">method</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"lm"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_point</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">scale_color_continuous</span><span class="p">(</span><span class="n">low</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">high</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">2</span><span class="p">])</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme_tq</span><span class="p">()</span><span class="w">
</span>

Tidy text analysis

Next, I want to know more about my followers by analyzing their Twitter descriptions with the tidytext package.

<span class="n">library</span><span class="p">(</span><span class="n">tidytext</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">SnowballC</span><span class="p">)</span><span class="w">
</span>

To prepare the data, I am going to unnest the words (or tokens) in the user descriptions, convert them to the word stem, remove stop words and urls.

<span class="n">data</span><span class="p">(</span><span class="n">stop_words</span><span class="p">)</span><span class="w">

</span><span class="n">tidy_descr</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">unnest_tokens</span><span class="p">(</span><span class="n">word</span><span class="p">,</span><span class="w"> </span><span class="n">description</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">word_stem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wordStem</span><span class="p">(</span><span class="n">word</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">anti_join</span><span class="p">(</span><span class="n">stop_words</span><span class="p">,</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"word"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"\\.|http"</span><span class="p">,</span><span class="w"> </span><span class="n">word</span><span class="p">))</span><span class="w">
</span>

What are the most commonly used words in my followers’ descriptions?

The first question I want to ask is what words are most common in my followers’ descriptions.

Not surprisingly, the most common word is “data”. I do tweet mostly about data related topics, so it makes sense that my followers are mostly likeminded. The rest is also related to data science, machine learning and R.

<span class="n">tidy_descr</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">count</span><span class="p">(</span><span class="n">word_stem</span><span class="p">,</span><span class="w"> </span><span class="n">sort</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">20</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">reorder</span><span class="p">(</span><span class="n">word_stem</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">),</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_col</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme_tq</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
         </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"count of word stem in all followers' descriptions"</span><span class="p">)</span><span class="w">
</span>

This, we can also show with a word cloud.

<span class="n">library</span><span class="p">(</span><span class="n">wordcloud</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tm</span><span class="p">)</span><span class="w">
</span>
<span class="n">tidy_descr</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">count</span><span class="p">(</span><span class="n">word_stem</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">word_stem</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">removeNumbers</span><span class="p">(</span><span class="n">word_stem</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">with</span><span class="p">(</span><span class="n">wordcloud</span><span class="p">(</span><span class="n">word_stem</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">max.words</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">colors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()))</span><span class="w">
</span>

Instead of looking for the most common words, we can also look for the most common ngrams: here, for the most common word pairs (bigrams) in my followers’ descriptions.

<span class="n">tidy_descr_ngrams</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">unnest_tokens</span><span class="p">(</span><span class="n">bigram</span><span class="p">,</span><span class="w"> </span><span class="n">description</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ngrams"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"\\.|http"</span><span class="p">,</span><span class="w"> </span><span class="n">bigram</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">separate</span><span class="p">(</span><span class="n">bigram</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"word1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"word2"</span><span class="p">),</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" "</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">word1</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">stop_words</span><span class="o">$</span><span class="n">word</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">word2</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">stop_words</span><span class="o">$</span><span class="n">word</span><span class="p">)</span><span class="w">

</span><span class="n">bigram_counts</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tidy_descr_ngrams</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">count</span><span class="p">(</span><span class="n">word1</span><span class="p">,</span><span class="w"> </span><span class="n">word2</span><span class="p">,</span><span class="w"> </span><span class="n">sort</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w">
</span>
<span class="n">bigram_counts</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">10</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">reorder</span><span class="p">(</span><span class="n">word1</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">n</span><span class="p">),</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">reorder</span><span class="p">(</span><span class="n">word2</span><span class="p">,</span><span class="w"> </span><span class="o">-</span><span class="n">n</span><span class="p">),</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_tile</span><span class="p">(</span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">,</span><span class="w"> </span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"white"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">scale_fill_gradientn</span><span class="p">(</span><span class="n">colours</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">palette_light</span><span class="p">()[[</span><span class="m">1</span><span class="p">]],</span><span class="w"> </span><span class="n">palette_light</span><span class="p">()[[</span><span class="m">2</span><span class="p">]]))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme_tq</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme</span><span class="p">(</span><span class="n">legend.position</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"right"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme</span><span class="p">(</span><span class="n">axis.text.x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">element_text</span><span class="p">(</span><span class="n">angle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">45</span><span class="p">,</span><span class="w"> </span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"first word in pair"</span><span class="p">,</span><span class="w">
         </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"second word in pair"</span><span class="p">)</span><span class="w">
</span>

These, we can also show as a graph:

<span class="n">library</span><span class="p">(</span><span class="n">igraph</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">ggraph</span><span class="p">)</span><span class="w">
</span>
<span class="n">bigram_graph</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bigram_counts</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">5</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">graph_from_data_frame</span><span class="p">()</span><span class="w">

</span><span class="n">set.seed</span><span class="p">(</span><span class="m">1</span><span class="p">)</span><span class="w">

</span><span class="n">a</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">grid</span><span class="o">::</span><span class="n">arrow</span><span class="p">(</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"closed"</span><span class="p">,</span><span class="w"> </span><span class="n">length</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unit</span><span class="p">(</span><span class="m">.15</span><span class="p">,</span><span class="w"> </span><span class="s2">"inches"</span><span class="p">))</span><span class="w">
</span>
<span class="n">ggraph</span><span class="p">(</span><span class="n">bigram_graph</span><span class="p">,</span><span class="w"> </span><span class="n">layout</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"fr"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_edge_link</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">edge_alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">),</span><span class="w"> </span><span class="n">show.legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w">
                 </span><span class="n">arrow</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">a</span><span class="p">,</span><span class="w"> </span><span class="n">end_cap</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">circle</span><span class="p">(</span><span class="m">.07</span><span class="p">,</span><span class="w"> </span><span class="s1">'inches'</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_node_point</span><span class="p">(</span><span class="n">color</span><span class="w"> </span><span class="o">=</span><span class="w">  </span><span class="n">palette_light</span><span class="p">()[</span><span class="m">1</span><span class="p">],</span><span class="w"> </span><span class="n">size</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">5</span><span class="p">,</span><span class="w"> </span><span class="n">alpha</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.8</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_node_text</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">label</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">name</span><span class="p">),</span><span class="w"> </span><span class="n">vjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="n">hjust</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0.5</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_void</span><span class="p">()</span><span class="w">
</span>

We can also use bigram analysis to identify negated meanings (this will become relevant for sentiment analysis later). So, let’s look at which words are preceded by “not” or “no”.

<span class="n">bigrams_separated</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">followers_df</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">unnest_tokens</span><span class="p">(</span><span class="n">bigram</span><span class="p">,</span><span class="w"> </span><span class="n">description</span><span class="p">,</span><span class="w"> </span><span class="n">token</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"ngrams"</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">grepl</span><span class="p">(</span><span class="s2">"\\.|http"</span><span class="p">,</span><span class="w"> </span><span class="n">bigram</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">separate</span><span class="p">(</span><span class="n">bigram</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"word1"</span><span class="p">,</span><span class="w"> </span><span class="s2">"word2"</span><span class="p">),</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" "</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">word1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"not"</span><span class="w"> </span><span class="o">|</span><span class="w"> </span><span class="n">word1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"no"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="n">word2</span><span class="w"> </span><span class="o">%in%</span><span class="w"> </span><span class="n">stop_words</span><span class="o">$</span><span class="n">word</span><span class="p">)</span><span class="w">

</span><span class="n">not_words</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">bigrams_separated</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">filter</span><span class="p">(</span><span class="n">word1</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s2">"not"</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">inner_join</span><span class="p">(</span><span class="n">get_sentiments</span><span class="p">(</span><span class="s2">"afinn"</span><span class="p">),</span><span class="w"> </span><span class="n">by</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="n">word2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"word"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">count</span><span class="p">(</span><span class="n">word2</span><span class="p">,</span><span class="w"> </span><span class="n">score</span><span class="p">,</span><span class="w"> </span><span class="n">sort</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ungroup</span><span class="p">()</span><span class="w">
</span>
<span class="n">not_words</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">contribution</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">score</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="nf">abs</span><span class="p">(</span><span class="n">contribution</span><span class="p">)))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">head</span><span class="p">(</span><span class="m">20</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">word2</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">reorder</span><span class="p">(</span><span class="n">word2</span><span class="p">,</span><span class="w"> </span><span class="n">contribution</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">ggplot</span><span class="p">(</span><span class="n">aes</span><span class="p">(</span><span class="n">word2</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">score</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">score</span><span class="w"> </span><span class="o">></span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">geom_col</span><span class="p">(</span><span class="n">show.legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">scale_fill_manual</span><span class="p">(</span><span class="n">values</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">palette_light</span><span class="p">())</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
         </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Sentiment score * number of occurrences"</span><span class="p">,</span><span class="w">
         </span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Words preceded by \"not\""</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
    </span><span class="n">theme_tq</span><span class="p">()</span><span class="w">
</span>

What’s the predominant sentiment in my followers’ descriptions?

For sentiment analysis, I will exclude the words with a negated meaning from nrc and switch their positive and negative meanings from bing (although in this case, there was only one negated word, “endorsement”, so it won’t make a real difference).

<span class="n">tidy_descr_sentiment</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">tidy_descr</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">left_join</span><span class="p">(</span><span class="n">select</span><span class="p">(</span><span class="n">bigrams_separated</span><span class="p">,</span><span...

To leave a comment for the author, please follow the link and comment on their blog: Shirin's playgRound.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)