A quick #WorldEmojiDay exploration

July 16, 2018
By

[This article was first published on Colin Fay, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Letโ€™s celebrate #WorldEmojiDay with a quick exploration of my own
twitter account
.

The ?

Weโ€™ll need:

From Github

  • {emo}

remote::install_github("hadley/emo")

From CRAN

  • {dplyr}
  • {tidyr}
  • {rtweet}
  • {tidytext}

Note: This page has been created at:

Sys.time()
## [1] "2018-07-17 17:22:29 CEST"

The ?

Letโ€™s get my last 3200 tweets:

library(emo)
library(rtweet)
library(dplyr)
## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
res <- get_timeline(
  "_ColinFay",
  n = 3200
)
names(res)
##  [1] "user_id"                 "status_id"              
##  [3] "created_at"              "screen_name"            
##  [5] "text"                    "source"                 
##  [7] "display_text_width"      "reply_to_status_id"     
##  [9] "reply_to_user_id"        "reply_to_screen_name"   
## [11] "is_quote"                "is_retweet"             
## [13] "favorite_count"          "retweet_count"          
## [15] "hashtags"                "symbols"                
## [17] "urls_url"                "urls_t.co"              
## [19] "urls_expanded_url"       "media_url"              
## [21] "media_t.co"              "media_expanded_url"     
## [23] "media_type"              "ext_media_url"          
## [25] "ext_media_t.co"          "ext_media_expanded_url" 
## [27] "ext_media_type"          "mentions_user_id"       
## [29] "mentions_screen_name"    "lang"                   
## [31] "quoted_status_id"        "quoted_text"            
## [33] "quoted_created_at"       "quoted_source"          
## [35] "quoted_favorite_count"   "quoted_retweet_count"   
## [37] "quoted_user_id"          "quoted_screen_name"     
## [39] "quoted_name"             "quoted_followers_count" 
## [41] "quoted_friends_count"    "quoted_statuses_count"  
## [43] "quoted_location"         "quoted_description"     
## [45] "quoted_verified"         "retweet_status_id"      
## [47] "retweet_text"            "retweet_created_at"     
## [49] "retweet_source"          "retweet_favorite_count" 
## [51] "retweet_retweet_count"   "retweet_user_id"        
## [53] "retweet_screen_name"     "retweet_name"           
## [55] "retweet_followers_count" "retweet_friends_count"  
## [57] "retweet_statuses_count"  "retweet_location"       
## [59] "retweet_description"     "retweet_verified"       
## [61] "place_url"               "place_name"             
## [63] "place_full_name"         "place_type"             
## [65] "country"                 "country_code"           
## [67] "geo_coords"              "coords_coords"          
## [69] "bbox_coords"             "status_url"             
## [71] "name"                    "location"               
## [73] "description"             "url"                    
## [75] "protected"               "followers_count"        
## [77] "friends_count"           "listed_count"           
## [79] "statuses_count"          "favourites_count"       
## [81] "account_created_at"      "verified"               
## [83] "profile_url"             "profile_expanded_url"   
## [85] "account_lang"            "profile_banner_url"     
## [87] "profile_background_url"  "profile_image_url"

Here is what the text column looks like:

res %>% 
  pull(text) %>%
  .[1:5]
## [1] "@GoldbergData It adds a little label at the top left with the text you provide. \nCan be useful if you want to add some legends in a markdown / shiny app, for example"
## [2] "#RStats \nCool new feature in ggplot2 v3 โ€” tagging plots : https://t.co/jFUqX2Tj5T"                                                                                    
## [3] "#RStats โ€” A perfect introduction to \U0001f5fa with the {sf} \U0001f4e6 & Co by @statnmap : \nhttps://t.co/IrmcSBDMDy https://t.co/m3TyUjrxYF"                     
## [4] "@vsbuffalo Amen to that"                                                                                                                                               
## [5] "#RStats โ€” \U0001f680 Setting up RStudio Server, Shiny Server and PostgreSQL :\nhttps://t.co/J1Y7edNAj0"

As you can see, the emojis are not printed in the console, but converted
to weird characters like \U0001f4e6 and such. These are unicode
characters: translations of the emojis into a language your machine can
understand. I wonโ€™t go deeper into this, here are two resources you can
read if you want to know more about encoding:

The ?

Letโ€™s use the {emo} package to extract the emojis from the text.
Inspired by {stringr}, this package has a ji_extract_all function
that is designed to extract all the emojis from a character vector.
Weโ€™ll use it on out text column, then extract the date and emo column.
We then pass the result to tidyr::unnest in order to remove the empty
emo rows (i.e, the tweets without an emoji).

library(tidyr)
emos <- res %>%
  mutate(
    emo = ji_extract_all(text)
  ) %>%
  select(created_at,emo) %>%
  unnest(emo)

emos
## # A tibble: 887 x 2
##    created_at          emo  
##                  
##  1 2018-07-17 10:00:47 ?   
##  2 2018-07-17 08:35:05 ?   
##  3 2018-07-16 18:47:25 ?   
##  4 2018-07-16 14:51:30 ?   
##  5 2018-07-16 14:51:16 ?   
##  6 2018-07-16 13:28:08 ?   
##  7 2018-07-16 13:27:00 ?   
##  8 2018-07-16 13:27:00 ?   
##  9 2018-07-16 13:27:00 ?   
## 10 2018-07-16 13:25:01 ?   
## # ... with 877 more rows
emos %>%
  count(emo, sort = TRUE)
## # A tibble: 187 x 2
##    emo       n
##     
##  1 ?       84
##  2 ?       56
##  3 ?       51
##  4 ?       50
##  5 ?       50
##  6 ?       42
##  7 ?       36
##  8 ?       35
##  9 ?       33
## 10 ?       28
## # ... with 177 more rows

So apparently, I use a lot of ?. But also talk about ?, which sounds
more appropriate ๐Ÿ™‚

As you can see, {tibble} converts elements to emojis when printing.
When using a data.frame, you have a simple unicode translation:

emos %>%
  as.data.frame() %>%
  head()
##            created_at        emo
## 1 2018-07-17 10:00:47 \U0001f4e6
## 2 2018-07-17 08:35:05 \U0001f680
## 3 2018-07-16 18:47:25 \U0001f62e
## 4 2018-07-16 14:51:30 \U0001f601
## 5 2018-07-16 14:51:16 \U0001f631
## 6 2018-07-16 13:28:08 \U0001f352

The ?

Letโ€™s flag all the emojis with their names:

emos %>%
  left_join(
    data.frame(
      emo = ji_name, 
      name = names(ji_name)
    )
  ) %>% 
  count(emo, name, sort = TRUE)
## Joining, by = "emo"

## Warning: Column `emo` joining character vector and factor, coercing into
## character vector

## # A tibble: 295 x 3
##    emo   name                       n
##                       
##  1 ?    thinking                  84
##  2 ?    thinking_face             84
##  3 ?    package                   56
##  4 ?    grimacing                 51
##  5 ?    grimacing_face            51
##  6 ?    party_popper              50
##  7 ?    tada                      50
##  8 ?    face_screaming_in_fear    50
##  9 ?    scream                    50
## 10 ?    innocent                  42
## # ... with 285 more rows

The ?

And finally, letโ€™s see what are the most associated words with the
emojis we just saw:

library(tidytext)
emos_with_id <- res %>%
  mutate(
    emo = ji_extract_all(text)
  ) %>% 
  select(status_id, text, emo) %>%
  tidyr::unnest(emo)

emos_with_id %>%
  unnest_tokens(word,text) %>%
  anti_join(stop_words) %>%
  anti_join(proustr::stop_words) %>%
  anti_join(
    data.frame(
      word = c("https", "t.co", "https", "gt")
    )
  ) %>% 
  count(emo, word, sort = TRUE)
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"

## Warning: Column `word` joining character vector and factor, coercing into
## character vector

## # A tibble: 5,660 x 3
##    emo   word          n
##          
##  1 ?    rstats       37
##  2 ?    rstats       27
##  3 ?    macbook      26
##  4 ?    package      20
##  5 ?    trans        18
##  6 โ˜•    pm           15
##  7 ?    pro          15
##  8 ?    marche       10
##  9 ?    ma_salmon    10
## 10 ?    ma_salmon    10
## # ... with 5,650 more rows

And what are the most used emojis with โ€œrstatsโ€?

emos_with_id %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  anti_join(proustr::stop_words) %>%
  anti_join(
    data.frame(
      word = c("https", "t.co", "https", "gt")
    )
  ) %>%
  count(emo, word, sort = TRUE) %>%
  filter(
    word == "rstats"
  )
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"

## Warning: Column `word` joining character vector and factor, coercing into
## character vector

## # A tibble: 81 x 3
##    emo   word       n
##       
##  1 ?    rstats    37
##  2 ?    rstats    27
##  3 ?    rstats     5
##  4 ?    rstats     4
##  5 ?    rstats     4
##  6 ?    rstats     4
##  7 โœ๏ธ    rstats     3
##  8 ?    rstats     3
##  9 ?    rstats     3
## 10 โšก    rstats     2
## # ... with 71 more rows

Other cool functions

I recently discovered the ji_glue() function which allows you to
insert an emoji easily into a character vector :

ji_glue("I love to code :package:")
## I love to code ?
ji_glue("Sometimes they make me :scream:")
## Sometimes they make me ?
ji_glue("Sometimes they make me :cry:")
## Sometimes they make me ?
ji_glue("Sometimes they make me :fear:")
## Sometimes they make me ?
ji_glue("But in the end I'm always :tada:")
## But in the end I'm always ?

The ji() function can also be used inside your markdown, so you can
write:

โ€œI hate backtick r emo::ji(โ€bugโ€œ) backtickโ€, and it will come as: โ€œI
hate ?โ€.

(of course, replace backtick by actuwith backticks ๐Ÿ™‚ ).

Thatโ€™s all folks ?

Thatโ€™s all for today! Now have a nice emoji day ?

To leave a comment for the author, please follow the link and comment on their blog: Colin Fay.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)