A quick #WorldEmojiDay exploration

July 16, 2018
By

(This article was first published on Colin Fay, and kindly contributed to R-bloggers)

Letโ€™s celebrate #WorldEmojiDay with a quick exploration of my own
twitter account
.

The ๐Ÿ“ฆ

Weโ€™ll need:

From Github

  • {emo}

remote::install_github("hadley/emo")

From CRAN

  • {dplyr}
  • {tidyr}
  • {rtweet}
  • {tidytext}

Note: This page has been created at:

Sys.time()
## [1] "2018-07-17 17:22:29 CEST"

The ๐Ÿ”

Letโ€™s get my last 3200 tweets:

library(emo)
library(rtweet)
library(dplyr)
## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
res <- get_timeline(
  "_ColinFay",
  n = 3200
)
names(res)
##  [1] "user_id"                 "status_id"              
##  [3] "created_at"              "screen_name"            
##  [5] "text"                    "source"                 
##  [7] "display_text_width"      "reply_to_status_id"     
##  [9] "reply_to_user_id"        "reply_to_screen_name"   
## [11] "is_quote"                "is_retweet"             
## [13] "favorite_count"          "retweet_count"          
## [15] "hashtags"                "symbols"                
## [17] "urls_url"                "urls_t.co"              
## [19] "urls_expanded_url"       "media_url"              
## [21] "media_t.co"              "media_expanded_url"     
## [23] "media_type"              "ext_media_url"          
## [25] "ext_media_t.co"          "ext_media_expanded_url" 
## [27] "ext_media_type"          "mentions_user_id"       
## [29] "mentions_screen_name"    "lang"                   
## [31] "quoted_status_id"        "quoted_text"            
## [33] "quoted_created_at"       "quoted_source"          
## [35] "quoted_favorite_count"   "quoted_retweet_count"   
## [37] "quoted_user_id"          "quoted_screen_name"     
## [39] "quoted_name"             "quoted_followers_count" 
## [41] "quoted_friends_count"    "quoted_statuses_count"  
## [43] "quoted_location"         "quoted_description"     
## [45] "quoted_verified"         "retweet_status_id"      
## [47] "retweet_text"            "retweet_created_at"     
## [49] "retweet_source"          "retweet_favorite_count" 
## [51] "retweet_retweet_count"   "retweet_user_id"        
## [53] "retweet_screen_name"     "retweet_name"           
## [55] "retweet_followers_count" "retweet_friends_count"  
## [57] "retweet_statuses_count"  "retweet_location"       
## [59] "retweet_description"     "retweet_verified"       
## [61] "place_url"               "place_name"             
## [63] "place_full_name"         "place_type"             
## [65] "country"                 "country_code"           
## [67] "geo_coords"              "coords_coords"          
## [69] "bbox_coords"             "status_url"             
## [71] "name"                    "location"               
## [73] "description"             "url"                    
## [75] "protected"               "followers_count"        
## [77] "friends_count"           "listed_count"           
## [79] "statuses_count"          "favourites_count"       
## [81] "account_created_at"      "verified"               
## [83] "profile_url"             "profile_expanded_url"   
## [85] "account_lang"            "profile_banner_url"     
## [87] "profile_background_url"  "profile_image_url"

Here is what the text column looks like:

res %>% 
  pull(text) %>%
  .[1:5]
## [1] "@GoldbergData It adds a little label at the top left with the text you provide. \nCan be useful if you want to add some legends in a markdown / shiny app, for example"
## [2] "#RStats \nCool new feature in ggplot2 v3 โ€” tagging plots : https://t.co/jFUqX2Tj5T"                                                                                    
## [3] "#RStats โ€” A perfect introduction to \U0001f5fa with the {sf} \U0001f4e6 & Co by @statnmap : \nhttps://t.co/IrmcSBDMDy https://t.co/m3TyUjrxYF"                     
## [4] "@vsbuffalo Amen to that"                                                                                                                                               
## [5] "#RStats โ€” \U0001f680 Setting up RStudio Server, Shiny Server and PostgreSQL :\nhttps://t.co/J1Y7edNAj0"

As you can see, the emojis are not printed in the console, but converted
to weird characters like \U0001f4e6 and such. These are unicode
characters: translations of the emojis into a language your machine can
understand. I wonโ€™t go deeper into this, here are two resources you can
read if you want to know more about encoding:

The ๐Ÿ“Š

Letโ€™s use the {emo} package to extract the emojis from the text.
Inspired by {stringr}, this package has a ji_extract_all function
that is designed to extract all the emojis from a character vector.
Weโ€™ll use it on out text column, then extract the date and emo column.
We then pass the result to tidyr::unnest in order to remove the empty
emo rows (i.e, the tweets without an emoji).

library(tidyr)
emos <- res %>%
  mutate(
    emo = ji_extract_all(text)
  ) %>%
  select(created_at,emo) %>%
  unnest(emo)

emos
## # A tibble: 887 x 2
##    created_at          emo  
##                  
##  1 2018-07-17 10:00:47 ๐Ÿ“ฆ   
##  2 2018-07-17 08:35:05 ๐Ÿš€   
##  3 2018-07-16 18:47:25 ๐Ÿ˜ฎ   
##  4 2018-07-16 14:51:30 ๐Ÿ˜   
##  5 2018-07-16 14:51:16 ๐Ÿ˜ฑ   
##  6 2018-07-16 13:28:08 ๐Ÿ’   
##  7 2018-07-16 13:27:00 ๐Ÿ˜ˆ   
##  8 2018-07-16 13:27:00 ๐ŸŒฒ   
##  9 2018-07-16 13:27:00 ๐Ÿ’€   
## 10 2018-07-16 13:25:01 ๐Ÿ›   
## # ... with 877 more rows
emos %>%
  count(emo, sort = TRUE)
## # A tibble: 187 x 2
##    emo       n
##     
##  1 ๐Ÿค”       84
##  2 ๐Ÿ“ฆ       56
##  3 ๐Ÿ˜ฌ       51
##  4 ๐ŸŽ‰       50
##  5 ๐Ÿ˜ฑ       50
##  6 ๐Ÿ˜‡       42
##  7 ๐Ÿ˜       36
##  8 ๐Ÿ™ƒ       35
##  9 ๐Ÿ˜‚       33
## 10 ๐Ÿ˜œ       28
## # ... with 177 more rows

So apparently, I use a lot of ๐Ÿค”. But also talk about ๐Ÿ“ฆ, which sounds
more appropriate ๐Ÿ™‚

As you can see, {tibble} converts elements to emojis when printing.
When using a data.frame, you have a simple unicode translation:

emos %>%
  as.data.frame() %>%
  head()
##            created_at        emo
## 1 2018-07-17 10:00:47 \U0001f4e6
## 2 2018-07-17 08:35:05 \U0001f680
## 3 2018-07-16 18:47:25 \U0001f62e
## 4 2018-07-16 14:51:30 \U0001f601
## 5 2018-07-16 14:51:16 \U0001f631
## 6 2018-07-16 13:28:08 \U0001f352

The ๐Ÿท

Letโ€™s flag all the emojis with their names:

emos %>%
  left_join(
    data.frame(
      emo = ji_name, 
      name = names(ji_name)
    )
  ) %>% 
  count(emo, name, sort = TRUE)
## Joining, by = "emo"

## Warning: Column `emo` joining character vector and factor, coercing into
## character vector

## # A tibble: 295 x 3
##    emo   name                       n
##                       
##  1 ๐Ÿค”    thinking                  84
##  2 ๐Ÿค”    thinking_face             84
##  3 ๐Ÿ“ฆ    package                   56
##  4 ๐Ÿ˜ฌ    grimacing                 51
##  5 ๐Ÿ˜ฌ    grimacing_face            51
##  6 ๐ŸŽ‰    party_popper              50
##  7 ๐ŸŽ‰    tada                      50
##  8 ๐Ÿ˜ฑ    face_screaming_in_fear    50
##  9 ๐Ÿ˜ฑ    scream                    50
## 10 ๐Ÿ˜‡    innocent                  42
## # ... with 285 more rows

The ๐Ÿ” 

And finally, letโ€™s see what are the most associated words with the
emojis we just saw:

library(tidytext)
emos_with_id <- res %>%
  mutate(
    emo = ji_extract_all(text)
  ) %>% 
  select(status_id, text, emo) %>%
  tidyr::unnest(emo)

emos_with_id %>%
  unnest_tokens(word,text) %>%
  anti_join(stop_words) %>%
  anti_join(proustr::stop_words) %>%
  anti_join(
    data.frame(
      word = c("https", "t.co", "https", "gt")
    )
  ) %>% 
  count(emo, word, sort = TRUE)
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"

## Warning: Column `word` joining character vector and factor, coercing into
## character vector

## # A tibble: 5,660 x 3
##    emo   word          n
##          
##  1 ๐Ÿ“ฆ    rstats       37
##  2 ๐ŸŽ‰    rstats       27
##  3 ๐Ÿ’ป    macbook      26
##  4 ๐Ÿ“ฆ    package      20
##  5 ๐Ÿ‘    trans        18
##  6 โ˜•    pm           15
##  7 ๐Ÿ’ป    pro          15
##  8 ๐Ÿ’ป    marche       10
##  9 ๐Ÿค”    ma_salmon    10
## 10 ๐Ÿ˜ฑ    ma_salmon    10
## # ... with 5,650 more rows

And what are the most used emojis with โ€œrstatsโ€?

emos_with_id %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  anti_join(proustr::stop_words) %>%
  anti_join(
    data.frame(
      word = c("https", "t.co", "https", "gt")
    )
  ) %>%
  count(emo, word, sort = TRUE) %>%
  filter(
    word == "rstats"
  )
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"

## Warning: Column `word` joining character vector and factor, coercing into
## character vector

## # A tibble: 81 x 3
##    emo   word       n
##       
##  1 ๐Ÿ“ฆ    rstats    37
##  2 ๐ŸŽ‰    rstats    27
##  3 ๐Ÿ˜ฌ    rstats     5
##  4 ๐ŸŒŸ    rstats     4
##  5 ๐Ÿ‘Œ    rstats     4
##  6 ๐Ÿค”    rstats     4
##  7 โœ๏ธ    rstats     3
##  8 ๐Ÿ’Ž    rstats     3
##  9 ๐Ÿ™Œ    rstats     3
## 10 โšก    rstats     2
## # ... with 71 more rows

Other cool functions

I recently discovered the ji_glue() function which allows you to
insert an emoji easily into a character vector :

ji_glue("I love to code :package:")
## I love to code ๐Ÿ“ฆ
ji_glue("Sometimes they make me :scream:")
## Sometimes they make me ๐Ÿ˜ฑ
ji_glue("Sometimes they make me :cry:")
## Sometimes they make me ๐Ÿ˜ข
ji_glue("Sometimes they make me :fear:")
## Sometimes they make me ๐Ÿ˜จ
ji_glue("But in the end I'm always :tada:")
## But in the end I'm always ๐ŸŽ‰

The ji() function can also be used inside your markdown, so you can
write:

โ€œI hate backtick r emo::ji(โ€bugโ€œ) backtickโ€, and it will come as: โ€œI
hate ๐Ÿ›โ€.

(of course, replace backtick by actuwith backticks ๐Ÿ™‚ ).

Thatโ€™s all folks ๐ŸŽฌ

Thatโ€™s all for today! Now have a nice emoji day ๐ŸŽ‰

To leave a comment for the author, please follow the link and comment on their blog: Colin Fay.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)