Exploratory & sentiment analysis of beer tweets from Untappd on Twitter

[This article was first published on Jasmine Dumas' R Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Project Objective

Untappd has some usage restrictions for their API namely not allowing any exploratory of analytics uses, so I’m going to explore tweets of beer and brewery check-ins from the Untappd app to find some implicit trends in how users share their activity.

Exploratory Analysis

<span class="n">library</span><span class="p">(</span><span class="n">tidyverse</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">rtweet</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">stringr</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">wesanderson</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">maps</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">tidytext</span><span class="p">)</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">dumas</span><span class="p">)</span><span class="w"> </span><span class="c1"># http://jasdumas.github.io/dumas/</span><span class="w">
</span><span class="n">library</span><span class="p">(</span><span class="n">wordcloud</span><span class="p">)</span><span class="w">
</span>

All social media shares from the Untappd app include their own short URL ‘untp.beer’, which makes the search query criteria identifiable using the search_tweets() function.

<span class="n">untp</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">search_tweets</span><span class="p">(</span><span class="n">q</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"untp.beer"</span><span class="p">,</span><span class="w"> 
                      </span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">18000</span><span class="p">,</span><span class="w"> 
                      </span><span class="n">include_rts</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">,</span><span class="w"> 
                      </span><span class="n">retryonratelimit</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">TRUE</span><span class="p">,</span><span class="w">  
                      </span><span class="n">geocode</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">lookup_coords</span><span class="p">(</span><span class="s2">"usa"</span><span class="p">),</span><span class="w"> 
                      </span><span class="n">lang</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"en"</span><span class="w">
</span><span class="p">)</span><span class="w">
</span>

Let’s take a peek at the text of the tweet!

<span class="n">head</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">text</span><span class="p">)</span><span class="w">
</span>
## [1] "I just earned the 'Land of the Free (Level 94)' badge on @untappd! https://t.co/oSOzNMA6tn" 
## [2] "I just earned the 'Wheel of Styles (Level 10)' badge on @untappd! https://t.co/jbW0zjtyAl"  
## [3] "I just earned the 'God Save the Queen' badge on @untappd! https://t.co/CdhKjOKK40"          
## [4] "I just earned the 'Draft City (Level 2)' badge on @untappd! https://t.co/Ps0WOIRUot"        
## [5] "I just earned the 'Middle of the Road (Level 2)' badge on @untappd! https://t.co/XRXVRcc8Iq"
## [6] "Drinking a GRITz by @BrainDeadBrew at @luckdallas — https://t.co/BPCAmDcpdE"

From this sample, its apparent that there a few type of default tweets that are available.

Let’s explore some descriptive stats about the 18,000 tweets that were extracted

How many unique users are in the data set?

<span class="nf">length</span><span class="p">(</span><span class="n">unique</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">user_id</span><span class="p">))</span><span class="w">
</span>
## [1] 5897
<span class="n">paste</span><span class="p">(</span><span class="nf">min</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">created_at</span><span class="p">),</span><span class="w"> </span><span class="nf">max</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">created_at</span><span class="p">),</span><span class="w"> </span><span class="n">sep</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">" to "</span><span class="w"> </span><span class="p">)</span><span class="w">
</span>
## [1] "2018-02-04 19:35:40 to 2018-02-05 23:35:29"

My initial assumptions were that all the tweets would be posted from the app, but it seems there is a little bit cross-posting going on from Facebook and some nerds who have set up IFTTT applet recipes.

<span class="n">count</span><span class="p">(</span><span class="n">untp</span><span class="p">,</span><span class="w"> </span><span class="n">source</span><span class="p">)</span><span class="w">
</span>
## # A tibble: 3 x 2
##     source     n
##      <chr> <int>
## 1 Facebook    10
## 2    IFTTT     6
## 3  Untappd 17984

How many types of these check-ins are shared?

There a few different types of tweet structures that can be shared from the Untappd app as notices from the text sample above. They include:

  1. Earning Badges (i.e. tweets that contain ‘I just earned the…’ or even the word ‘badge’)
  2. Added review text (i.e. text which ends in a ‘-‘ before the default template of ‘Drinking a’)
  3. Default check-ins (i.e. tweets that begin with ‘Drinking a’)
  4. Brewery offering updates (i.e. tweets that begin with ‘Just added …’ for new beers added)

There are granular account social settings available that enable the ease of sharing check-in info to certain linked social media accounts.

I’m going to detect the string patterns and create a new column in the data set to house them.

<span class="n">untp</span><span class="o">$</span><span class="n">structure_type</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">

</span><span class="c1"># earning badges</span><span class="w">
</span><span class="n">untp</span><span class="o">$</span><span class="n">structure_type</span><span class="p">[</span><span class="n">untp</span><span class="o">$</span><span class="n">text</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">str_detect</span><span class="p">(</span><span class="s2">"I just earned the"</span><span class="p">)]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"badge achievement"</span><span class="w">
</span><span class="c1"># default checkin</span><span class="w">
</span><span class="n">untp</span><span class="o">$</span><span class="n">structure_type</span><span class="p">[</span><span class="n">untp</span><span class="o">$</span><span class="n">text</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">str_detect</span><span class="p">(</span><span class="s2">"^Drinking "</span><span class="p">)]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"default check-in"</span><span class="w">
</span><span class="c1"># brewery updates</span><span class="w">
</span><span class="n">untp</span><span class="o">$</span><span class="n">structure_type</span><span class="p">[</span><span class="n">untp</span><span class="o">$</span><span class="n">text</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">str_detect</span><span class="p">(</span><span class="s2">"^Just added"</span><span class="p">)]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"brewery update"</span><span class="w">
</span><span class="c1"># any NA's left should be tweets that users have added additional text/descriptions</span><span class="w">
</span><span class="n">untp</span><span class="o">$</span><span class="n">structure_type</span><span class="p">[</span><span class="nf">is.na</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">structure_type</span><span class="p">)]</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="s2">"additional review"</span><span class="w">
</span>

Given that a single check-in can result in multiple badges and multiple social shares, this makes sense to have more tweets associated with the type of beer check-in.

<span class="n">untp</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">structure_type</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
         </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
         </span><span class="n">mutate</span><span class="p">(</span><span class="n">structure_type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fct_reorder</span><span class="p">(</span><span class="n">structure_type</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">ggplot</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">structure_type</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">geom_bar</span><span class="p">(</span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wes_palette</span><span class="p">(</span><span class="s2">"BottleRocket1"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count of Tweet Types for Untappd Twitter Shares"</span><span class="p">,</span><span class="w"> 
           </span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Most users have shared their badge acheivements"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count of Tweets"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w">
</span>

plot of chunk tweettype

What kind of badges are users earning?

The ‘Brew Bowl LII’ badge is the most popular earned badge available during this Superbowl weekend. Consequently, I visited a brewery this weekend and earned this badge as well.

<span class="n">untp</span><span class="o">$</span><span class="n">badge_type</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="kc">NA</span><span class="w">
</span><span class="n">untp</span><span class="o">$</span><span class="n">badge_type</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_extract</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">text</span><span class="p">,</span><span class="w"> </span><span class="s2">"(?<=').*?(?=')"</span><span class="p">)</span><span class="w">
</span>

There were numerous ‘Middle of the Road’ badges awarded of various levels. The description of the Level 5 of that badge is:

Looking for more kick than a session beer, but want to be able to stay for a few rounds? You have to keep it in the middle. That’s 25 beers with an ABV greater than 5% and less than 10%.

So, it would appear that is a popular range of ABV that users are trying, which supports the notion of it’d generally easier to consume more moderate-alcohol content beers rather than heavier beers (Ales vs Stouts).

<span class="n">untp</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">badge_type</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
         </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
         </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">badge_type</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
         </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
         </span><span class="n">top_n</span><span class="p">(</span><span class="m">25</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
         </span><span class="n">mutate</span><span class="p">(</span><span class="n">badge_type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fct_reorder</span><span class="p">(</span><span class="n">badge_type</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">ggplot</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">badge_type</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">geom_bar</span><span class="p">(</span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wes_palette</span><span class="p">(</span><span class="s2">"GrandBudapest1"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count of Badge Types for Untappd Twitter Shares"</span><span class="p">,</span><span class="w"> 
           </span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Brew Bowl LII was the most awarded badge this weekend"</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count of Tweets"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w">
</span>

plot of chunk topbadge

Where do people check-in from?

Lot’s of activity on the east coast (Boston, MA being the top place at the time of running this analysis) and in the metros across the U.S.! Untappd is based in North Carolina, so it’s interesting to not see a lot of activity there, but users can have their location settings turned off for privacy in Twitter. This may also be indicative of users not filling out all the check-in details such as purchase or drinking location. Often times users seem to be drinking and checking-in at home and may want to mask their location given the plethora of missing values.

<span class="n">untp</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">place_full_name</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">top_n</span><span class="p">(</span><span class="m">11</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">data.frame</span><span class="p">()</span><span class="w">
</span>
##       place_full_name     n
## 1                <NA> 14776
## 2   Pennsylvania, USA    81
## 3        Florida, USA    66
## 4        Portland, OR    62
## 5  Cape Girardeau, MO    44
## 6    Philadelphia, PA    38
## 7         Phoenix, AZ    36
## 8          Dallas, TX    34
## 9     Los Angeles, CA    34
## 10       Brooklyn, NY    33
## 11      New York, USA    33

There are a few off the map, but the general distribution is effectively visualized with one yellow dot per tweet.

<span class="c1">## create lat/lng variables using all available tweet and profile geo-location data</span><span class="w">
</span><span class="n">untp</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">lat_lng</span><span class="p">(</span><span class="n">untp</span><span class="p">)</span><span class="w">

</span><span class="c1">## plot state boundaries</span><span class="w">
</span><span class="n">par</span><span class="p">(</span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">0</span><span class="p">))</span><span class="w">
</span><span class="n">maps</span><span class="o">::</span><span class="n">map</span><span class="p">(</span><span class="s2">"state"</span><span class="p">,</span><span class="w"> </span><span class="n">lwd</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.25</span><span class="p">)</span><span class="w">

</span><span class="c1">## plot lat and lng points onto state map</span><span class="w">
</span><span class="n">with</span><span class="p">(</span><span class="n">untp</span><span class="p">,</span><span class="w"> </span><span class="n">points</span><span class="p">(</span><span class="n">lng</span><span class="p">,</span><span class="w"> </span><span class="n">lat</span><span class="p">,</span><span class="w"> </span><span class="n">pch</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">20</span><span class="p">,</span><span class="w"> </span><span class="n">cex</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">.75</span><span class="p">,</span><span class="w"> </span><span class="n">col</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wes_palette</span><span class="p">(</span><span class="s2">"Cavalcanti1"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)))</span><span class="w">
</span>

plot of chunk map

Instead of crafting an ugly regex solution to extract the brewery from the tweet text, the column for mentions_screen_name is actually a decent proxy for the beer location if the brewery has a Twitter presence! I really enjoy Tree House Brewery, Green IPA and it is nice to see many others have tried out beers from their brewery.

<span class="c1"># mentions_screen_name is the brewery that produced the checked-in beer</span><span class="w">
</span><span class="n">tibble</span><span class="p">(</span><span class="n">brewery</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unlist</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">mentions_screen_name</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">group_by</span><span class="p">(</span><span class="n">brewery</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">brewery</span><span class="w"> </span><span class="o">!=</span><span class="w"> </span><span class="s1">'untappd'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">top_n</span><span class="p">(</span><span class="m">25</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">brewery</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fct_reorder</span><span class="p">(</span><span class="n">brewery</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">ggplot</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">brewery</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">geom_bar</span><span class="p">(</span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wes_palette</span><span class="p">(</span><span class="s2">"Moonrise2"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Top 25 Breweries for Untappd Twitter Shares"</span><span class="p">,</span><span class="w"> 
           </span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count of Tweets"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"by @handle occurrences"</span><span class="p">)</span><span class="w">
</span>

plot of chunk topbrew

How many pictures of beer are shared?

The #photo is indicative of a paired photo with a tweet from Untappd. The rest of the hash tags align with the Superbowl festivities

<span class="n">tibble</span><span class="p">(</span><span class="n">hashtags</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">unlist</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">hashtags</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">group_by</span><span class="p">(</span><span class="n">hashtags</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="o">!</span><span class="nf">is.na</span><span class="p">(</span><span class="n">hashtags</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">top_n</span><span class="p">(</span><span class="m">25</span><span class="p">)</span><span class="w">
</span>
## # A tibble: 25 x 2
##                hashtags     n
##                   <chr> <int>
##  1                photo  2897
##  2             brewbowl  2728
##  3        ibelieveinIPA    62
##  4            SuperBowl    40
##  5         FlyEaglesFly    37
##  6         FirstSqueeze    26
##  7            craftbeer    21
##  8 BrainDeadBottleShare    16
##  9          beerandfood    15
## 10        UntapTheStack    15
## # ... with 15 more rows

Did people share more during the Superbowl game?

There was definitely a spike on Sunday as the Superbowl was starting!

<span class="c1">## plot time series of tweets</span><span class="w">
</span><span class="n">ts_plot</span><span class="p">(</span><span class="n">untp</span><span class="p">,</span><span class="w"> </span><span class="s2">"1 hours"</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">ggplot2</span><span class="o">::</span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">ggplot2</span><span class="o">::</span><span class="n">theme</span><span class="p">(</span><span class="n">plot.title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">ggplot2</span><span class="o">::</span><span class="n">element_text</span><span class="p">(</span><span class="n">face</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"bold"</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">ggplot2</span><span class="o">::</span><span class="n">labs</span><span class="p">(</span><span class="w">
    </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">NULL</span><span class="p">,</span><span class="w">
    </span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Frequency of Untappd Twitter statuses from the past 2 days"</span><span class="p">,</span><span class="w">
    </span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Twitter status (tweet) counts aggregated using one-hour intervals"</span><span class="p">,</span><span class="w">
    </span><span class="n">caption</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"\nSource: Data collected from Twitter's REST API via rtweet"</span><span class="w">
  </span><span class="p">)</span><span class="w">
</span>

plot of chunk tsplot

Sentiment Analysis

We want to separate all the text before the dash (-) which is how the tweet is structured when users add additional text to their beer review share.

<span class="c1"># the first vector is the review, second is the beer/brewery, third is the url</span><span class="w">
</span><span class="n">untp_text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_split</span><span class="p">(</span><span class="n">untp</span><span class="o">$</span><span class="n">text</span><span class="p">[</span><span class="n">untp</span><span class="o">$</span><span class="n">structure_type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'additional review'</span><span class="p">],</span><span class="w"> </span><span class="s2">"-|—"</span><span class="p">)</span><span class="w">

</span><span class="c1"># unnest these unnamed lists, if they were named I would have used purrr::map_df()</span><span class="w">
</span><span class="c1"># https://stackoverflow.com/a/24496537/4143444</span><span class="w">
</span><span class="n">review</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sapply</span><span class="p">(</span><span class="n">untp_text</span><span class="p">,</span><span class="w"> </span><span class="s2">"["</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span><span class="n">beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">sapply</span><span class="p">(</span><span class="n">untp_text</span><span class="p">,</span><span class="w"> </span><span class="s2">"["</span><span class="p">,</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">user</span><span class="w"> </span><span class="o"><-</span><span class="n">untp</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">structure_type</span><span class="w"> </span><span class="o">==</span><span class="w"> </span><span class="s1">'additional review'</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
        </span><span class="n">select</span><span class="p">(</span><span class="n">starts_with</span><span class="p">(</span><span class="s2">"screen_name"</span><span class="p">))</span><span class="w">
</span><span class="n">untp_text</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">data.frame</span><span class="p">(</span><span class="n">review</span><span class="p">,</span><span class="w"> </span><span class="n">beer</span><span class="p">,</span><span class="w"> </span><span class="n">user</span><span class="o">$</span><span class="n">screen_name</span><span class="p">)</span><span class="w">
</span>

Let’s remove the hash tags from the beer column and the common begging of each description.

<span class="n">untp_text</span><span class="o">$</span><span class="n">clean_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_replace</span><span class="p">(</span><span class="n">untp_text</span><span class="o">$</span><span class="n">beer</span><span class="p">,</span><span class="s2">"#[a-zA-Z0-9]{1,}"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w">

</span><span class="n">untp_text</span><span class="o">$</span><span class="n">clean_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_replace_all</span><span class="p">(</span><span class="n">untp_text</span><span class="o">$</span><span class="n">clean_beer</span><span class="p">,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"Drinking a |Drinking an "</span><span class="p">),</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w"> 

</span><span class="n">untp_text</span><span class="o">$</span><span class="n">clean_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">str_replace_all</span><span class="p">(</span><span class="n">untp_text</span><span class="o">$</span><span class="n">clean_beer</span><span class="p">,</span><span class="w"> </span><span class="s2">"[[:punct:]]"</span><span class="p">,</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w"> 

</span><span class="n">untp_text</span><span class="o">$</span><span class="n">clean_beer</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">trimws</span><span class="p">(</span><span class="n">untp_text</span><span class="o">$</span><span class="n">clean_beer</span><span class="p">)</span><span class="w">
</span>

Now that we have the reviews and beer/breweries separated, I wonder if there are any commonly reviewed beers that are shared on Twitter? (I want to keep the beer with the brewery at this point in case the same beer name appears at different breweries)

<span class="n">untp_text</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">clean_beer</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">top_n</span><span class="p">(</span><span class="m">15</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">clean_beer</span><span class="w"> </span><span class="o">%notin%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"DC"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
  </span><span class="n">mutate</span><span class="p">(</span><span class="n">clean_beer</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">fct_reorder</span><span class="p">(</span><span class="n">clean_beer</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">ggplot</span><span class="p">(</span><span class="n">.</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">clean_beer</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">geom_bar</span><span class="p">(</span><span class="n">stat</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"identity"</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wes_palette</span><span class="p">(</span><span class="s2">"BottleRocket2"</span><span class="p">,</span><span class="w"> </span><span class="m">1</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
      </span><span class="n">labs</span><span class="p">(</span><span class="n">title</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Top 15 Beers for Untappd\nTwitter Shares"</span><span class="p">,</span><span class="w"> 
           </span><span class="n">subtitle</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"Count of Tweets"</span><span class="p">,</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w">
</span>

plot of chunk topbeer

Translating the additional review text as proxy for empirical reviews

These tweets don’t indicate what numerical value that users rated each beer on a scale of 0.0 to 5.0 (0.25 increments) on the app, so I’m going to try and derive some of users opinions about beer from tweets that have additional review text, using the tidytext package. I think there is going to be some sentiment shared that is linked to the Super Bowl, and some selection bias from user’s most likely sharing preferred beers.

<span class="n">top_beers</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">untp_text</span><span class="w"> </span><span class="o">%>%</span><span class="w"> </span><span class="n">group_by</span><span class="p">(</span><span class="n">clean_beer</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">summarise</span><span class="p">(</span><span class="n">n</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">n</span><span class="p">())</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">arrange</span><span class="p">(</span><span class="n">desc</span><span class="p">(</span><span class="n">n</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">top_n</span><span class="p">(</span><span class="m">20</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">clean_beer</span><span class="w"> </span><span class="o">%notin%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"DC"</span><span class="p">,</span><span class="w"> </span><span class="s2">"A"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">select</span><span class="p">(</span><span class="n">clean_beer</span><span class="p">)</span><span class="w">

</span><span class="c1"># subset the data into topic (beers) and review (text) for tokenization</span><span class="w">
</span><span class="n">untp_text_tiny</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">untp_text</span><span class="p">[,</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"clean_beer"</span><span class="p">,</span><span class="w"> </span><span class="s2">"review"</span><span class="p">)]</span><span class="w">

</span><span class="c1"># need to inner_join top beers with the reviews</span><span class="w">
</span><span class="n">merge_untp</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">untp_text_tiny</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
              </span><span class="n">inner_join</span><span class="p">(</span><span class="n">top_beers</span><span class="p">)</span><span class="w">

</span><span class="c1"># tokenize the reviews and remove some of the specific football words</span><span class="w">
</span><span class="n">untp_text_tiny</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">merge_untp</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
                  </span><span class="n">unnest_tokens</span><span class="p">(</span><span class="n">word</span><span class="p">,</span><span class="w"> </span><span class="n">review</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w"> 
                  </span><span class="n">dplyr</span><span class="o">::</span><span class="n">filter</span><span class="p">(</span><span class="n">word</span><span class="w"> </span><span class="o">%notin%</span><span class="w"> </span><span class="nf">c</span><span class="p">(</span><span class="s2">"super"</span><span class="p">,</span><span class="w"> </span><span class="s2">"superbowl"</span><span class="p">,</span><span class="w"> </span><span class="s2">"superbowlsunday"</span><span class="w">
</span><span class="p">))</span><span class="w">
</span>
<span class="n">untp_sentiment</span><span class="w"> </span><span class="o"><-</span><span class="w"> 
  </span><span class="n">untp_text_tiny</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">inner_join</span><span class="p">(</span><span class="n">get_sentiments</span><span class="p">(</span><span class="s2">"bing"</span><span class="p">))</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">count</span><span class="p">(</span><span class="n">clean_beer</span><span class="p">,</span><span class="w"> </span><span class="n">sentiment</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">spread</span><span class="p">(</span><span class="n">sentiment</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">0</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
    </span><span class="n">mutate</span><span class="p">(</span><span class="n">sentiment</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">positive</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">negative</span><span class="p">)</span><span class="w"> 

</span><span class="n">head</span><span class="p">(</span><span class="n">untp_sentiment</span><span class="p">)</span><span class="w">
</span>
## # A tibble: 6 x 4
##                                             clean_beer negative positive
##                                                  <chr>    <dbl>    <dbl>
## 1                                    6 by relicbrewing        0        2
## 2                                  Abbey by newbelgium        0        3
## 3                                               Barrel        3        4
## 4                                       Bourbon Barrel        2        2
## 5            Bourbon County Brand Stout by GooseIsland        0        2
## 6 Canadian Breakfast Stout CBS 2017 by foundersbrewing        0        2
## # ... with 1 more variables: sentiment <dbl>

The ‘Drifter Pale Ale
Widmer Brothers Brewing’
having the lowest associated sentiment and currently rated: 3.43 and the ‘Stone Delicious IPA by Stone Brewing’ having the highest associated sentiment which is currently rated: 3.81.

<span class="n">ggplot</span><span class="p">(</span><span class="n">untp_sentiment</span><span class="p">,</span><span class="w"> </span><span class="n">aes</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">clean_beer</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">sentiment</span><span class="p">,</span><span class="w"> </span><span class="n">fill</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">clean_beer</span><span class="p">))</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">geom_col</span><span class="p">(</span><span class="n">show.legend</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">coord_flip</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">theme_minimal</span><span class="p">()</span><span class="w"> </span><span class="o">+</span><span class="w">
  </span><span class="n">labs</span><span class="p">(</span><span class="n">x</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">)</span><span class="w">
</span>

plot of chunk sentiment

Now, I want to visualize the words to see the most common occurrences from added reviews. The largest word being ‘Beer’ is the most obvious given the specific Untappd reviews. Then the other words appear to be popular hash tags from the Superbowl such as ‘flyeaglesfly’.

<span class="n">untp_text_tiny</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">anti_join</span><span class="p">(</span><span class="n">stop_words</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">count</span><span class="p">(</span><span class="n">word</span><span class="p">)</span><span class="w"> </span><span class="o">%>%</span><span class="w">
  </span><span class="n">with</span><span class="p">(</span><span class="n">wordcloud</span><span class="p">(</span><span class="n">word</span><span class="p">,</span><span class="w"> </span><span class="n">n</span><span class="p">,</span><span class="w"> </span><span class="n">max.words</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="m">100</span><span class="p">,</span><span class="w"> </span><span class="n">colors</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">wes_palette</span><span class="p">(</span><span class="s2">"Zissou1"</span><span class="p">)))</span><span class="w">
</span>

plot of chunk wordcloud

Notes:

  1. Untappd has a supporter program which comes with a feature for downloading your personal check-in data.

  2. I did not intentionally set out to run this analysis during the Superbowl and I did not watch the game!

To leave a comment for the author, please follow the link and comment on their blog: Jasmine Dumas' R Blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)