A quick #WorldEmojiDay exploration
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
Letโs celebrate #WorldEmojiDay with a quick exploration of my own twitter account.
The ?
Weโll need:
From Github
{emo}
remote::install_github("hadley/emo")
From CRAN
{dplyr}{tidyr}{rtweet}{tidytext}
Note: This page has been created at:
Sys.time() ## [1] "2018-07-17 17:22:29 CEST"
The ?
Letโs get my last 3200 tweets:
library(emo) library(rtweet) library(dplyr) ## ## Attaching package: 'dplyr' ## The following objects are masked from 'package:stats': ## ## filter, lag ## The following objects are masked from 'package:base': ## ## intersect, setdiff, setequal, union res <- get_timeline( "_ColinFay", n = 3200 ) names(res) ## [1] "user_id" "status_id" ## [3] "created_at" "screen_name" ## [5] "text" "source" ## [7] "display_text_width" "reply_to_status_id" ## [9] "reply_to_user_id" "reply_to_screen_name" ## [11] "is_quote" "is_retweet" ## [13] "favorite_count" "retweet_count" ## [15] "hashtags" "symbols" ## [17] "urls_url" "urls_t.co" ## [19] "urls_expanded_url" "media_url" ## [21] "media_t.co" "media_expanded_url" ## [23] "media_type" "ext_media_url" ## [25] "ext_media_t.co" "ext_media_expanded_url" ## [27] "ext_media_type" "mentions_user_id" ## [29] "mentions_screen_name" "lang" ## [31] "quoted_status_id" "quoted_text" ## [33] "quoted_created_at" "quoted_source" ## [35] "quoted_favorite_count" "quoted_retweet_count" ## [37] "quoted_user_id" "quoted_screen_name" ## [39] "quoted_name" "quoted_followers_count" ## [41] "quoted_friends_count" "quoted_statuses_count" ## [43] "quoted_location" "quoted_description" ## [45] "quoted_verified" "retweet_status_id" ## [47] "retweet_text" "retweet_created_at" ## [49] "retweet_source" "retweet_favorite_count" ## [51] "retweet_retweet_count" "retweet_user_id" ## [53] "retweet_screen_name" "retweet_name" ## [55] "retweet_followers_count" "retweet_friends_count" ## [57] "retweet_statuses_count" "retweet_location" ## [59] "retweet_description" "retweet_verified" ## [61] "place_url" "place_name" ## [63] "place_full_name" "place_type" ## [65] "country" "country_code" ## [67] "geo_coords" "coords_coords" ## [69] "bbox_coords" "status_url" ## [71] "name" "location" ## [73] "description" "url" ## [75] "protected" "followers_count" ## [77] "friends_count" "listed_count" ## [79] "statuses_count" "favourites_count" ## [81] "account_created_at" "verified" ## [83] "profile_url" "profile_expanded_url" ## [85] "account_lang" "profile_banner_url" ## [87] "profile_background_url" "profile_image_url"
Here is what the text column looks like:
res %>%
pull(text) %>%
.[1:5]
## [1] "@GoldbergData It adds a little label at the top left with the text you provide. \nCan be useful if you want to add some legends in a markdown / shiny app, for example"
## [2] "#RStats \nCool new feature in ggplot2 v3 โ tagging plots : https://t.co/jFUqX2Tj5T"
## [3] "#RStats โ A perfect introduction to \U0001f5fa with the {sf} \U0001f4e6 & Co by @statnmap : \nhttps://t.co/IrmcSBDMDy https://t.co/m3TyUjrxYF"
## [4] "@vsbuffalo Amen to that"
## [5] "#RStats โ \U0001f680 Setting up RStudio Server, Shiny Server and PostgreSQL :\nhttps://t.co/J1Y7edNAj0"
As you can see, the emojis are not printed in the console, but converted
to weird characters like \U0001f4e6 and such. These are unicode
characters: translations of the emojis into a language your machine can
understand. I wonโt go deeper into this, here are two resources you can
read if you want to know more about encoding:
The ?
Letโs use the {emo} package to extract the emojis from the text.
Inspired by {stringr}, this package has a ji_extract_all function
that is designed to extract all the emojis from a character vector.
Weโll use it on out text column, then extract the date and emo column.
We then pass the result to tidyr::unnest in order to remove the empty
emo rows (i.e, the tweets without an emoji).
library(tidyr)
emos <- res %>%
mutate(
emo = ji_extract_all(text)
) %>%
select(created_at,emo) %>%
unnest(emo)
emos
## # A tibble: 887 x 2
## created_at emo
## <dttm> <chr>
## 1 2018-07-17 10:00:47 ?
## 2 2018-07-17 08:35:05 ?
## 3 2018-07-16 18:47:25 ?
## 4 2018-07-16 14:51:30 ?
## 5 2018-07-16 14:51:16 ?
## 6 2018-07-16 13:28:08 ?
## 7 2018-07-16 13:27:00 ?
## 8 2018-07-16 13:27:00 ?
## 9 2018-07-16 13:27:00 ?
## 10 2018-07-16 13:25:01 ?
## # ... with 877 more rows
emos %>%
count(emo, sort = TRUE)
## # A tibble: 187 x 2
## emo n
## <chr> <int>
## 1 ? 84
## 2 ? 56
## 3 ? 51
## 4 ? 50
## 5 ? 50
## 6 ? 42
## 7 ? 36
## 8 ? 35
## 9 ? 33
## 10 ? 28
## # ... with 177 more rows
So apparently, I use a lot of ?. But also talk about ?, which sounds more appropriate ๐
As you can see, {tibble} converts elements to emojis when printing.
When using a data.frame, you have a simple unicode translation:
emos %>% as.data.frame() %>% head() ## created_at emo ## 1 2018-07-17 10:00:47 \U0001f4e6 ## 2 2018-07-17 08:35:05 \U0001f680 ## 3 2018-07-16 18:47:25 \U0001f62e ## 4 2018-07-16 14:51:30 \U0001f601 ## 5 2018-07-16 14:51:16 \U0001f631 ## 6 2018-07-16 13:28:08 \U0001f352
The ?
Letโs flag all the emojis with their names:
emos %>%
left_join(
data.frame(
emo = ji_name,
name = names(ji_name)
)
) %>%
count(emo, name, sort = TRUE)
## Joining, by = "emo"
## Warning: Column `emo` joining character vector and factor, coercing into
## character vector
## # A tibble: 295 x 3
## emo name n
## <chr> <fct> <int>
## 1 ? thinking 84
## 2 ? thinking_face 84
## 3 ? package 56
## 4 ? grimacing 51
## 5 ? grimacing_face 51
## 6 ? party_popper 50
## 7 ? tada 50
## 8 ? face_screaming_in_fear 50
## 9 ? scream 50
## 10 ? innocent 42
## # ... with 285 more rows
The ?
And finally, letโs see what are the most associated words with the emojis we just saw:
library(tidytext)
emos_with_id <- res %>%
mutate(
emo = ji_extract_all(text)
) %>%
select(status_id, text, emo) %>%
tidyr::unnest(emo)
emos_with_id %>%
unnest_tokens(word,text) %>%
anti_join(stop_words) %>%
anti_join(proustr::stop_words) %>%
anti_join(
data.frame(
word = c("https", "t.co", "https", "gt")
)
) %>%
count(emo, word, sort = TRUE)
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Warning: Column `word` joining character vector and factor, coercing into
## character vector
## # A tibble: 5,660 x 3
## emo word n
## <chr> <chr> <int>
## 1 ? rstats 37
## 2 ? rstats 27
## 3 ? macbook 26
## 4 ? package 20
## 5 ? trans 18
## 6 โ pm 15
## 7 ? pro 15
## 8 ? marche 10
## 9 ? ma_salmon 10
## 10 ? ma_salmon 10
## # ... with 5,650 more rows
And what are the most used emojis with โrstatsโ?
emos_with_id %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
anti_join(proustr::stop_words) %>%
anti_join(
data.frame(
word = c("https", "t.co", "https", "gt")
)
) %>%
count(emo, word, sort = TRUE) %>%
filter(
word == "rstats"
)
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Warning: Column `word` joining character vector and factor, coercing into
## character vector
## # A tibble: 81 x 3
## emo word n
## <chr> <chr> <int>
## 1 ? rstats 37
## 2 ? rstats 27
## 3 ? rstats 5
## 4 ? rstats 4
## 5 ? rstats 4
## 6 ? rstats 4
## 7 โ๏ธ rstats 3
## 8 ? rstats 3
## 9 ? rstats 3
## 10 โก rstats 2
## # ... with 71 more rows
Other cool functions
I recently discovered the ji_glue() function which allows you to
insert an emoji easily into a character vector :
ji_glue("I love to code :package:")
## I love to code ?
ji_glue("Sometimes they make me :scream:")
## Sometimes they make me ?
ji_glue("Sometimes they make me :cry:")
## Sometimes they make me ?
ji_glue("Sometimes they make me :fear:")
## Sometimes they make me ?
ji_glue("But in the end I'm always :tada:")
## But in the end I'm always ?
The ji() function can also be used inside your markdown, so you can
write:
โI hate backtick r emo::ji(โbugโ) backtickโ, and it will come as: โI hate ?โ.
(of course, replace backtick by actuwith backticks ๐ ).
Thatโs all folks ?
Thatโs all for today! Now have a nice emoji day ?
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.