Your and my 2019 R goals

December 31, 2018
By

[This article was first published on Posts on Maëlle's R blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Here we go again, using a Twitter trend as blog fodder! Colin Fay
launched an inspiring movement by sharing his R goals of 2019.

It’s been quite interesting reading the objectives of other tweeps: what
they want to learn, make, how they want to get involved in the
community, etc. As Mike Kearney, rtweet’s maintainer, underlined, it
is excellent reading material!

… but also blogging material! Let me fetch and tokenize these tweets to
summarize them!

Disclaimer: I later saw that Jason Baik got the same
idea
and was
faster than I, find the analysis
here.

Collect Twitter data

If you’re using rtweet for the first time, check out its
website
for information about use and setup and
also refer to Twitter API docs
themselves for knowing more about rate limitation and e.g. learning that
the search endpoint won’t let you get tweets older than 6-9 days old.

tweets <- rtweet::search_tweets("Rstats goals 2019",
                                include_rts = FALSE)

I obtained 87 tweets from 85 unique users. Definitely not big data, but
not bad!

Tokenize tweets

I then set out to tokenize the tweets into words using the specific
tokenizers::tokenize_tweets() tokenizer via the tidytext package. If
you’re new to tidytext I’d recommend reading the book written by its
authors
. A token in natural language
processing can be a word, line, etc. which is a totally different
concept from a token for rtweet functions (your API credentials).

The tweet tokenization is a “tokenization by word that preserves
usernames, hashtags, and URLS”. So awesome, and today is the first time
I find an occasion to use it! I also removed stopwords.

library("magrittr")

stopwords <- rcorpora::corpora("words/stopwords/en")$stopWords

tokens <- tweets %>%
  dplyr::select(text) %>%
  tidytext::unnest_tokens(token, text,
                        token = "tweets",
                        drop = FALSE) %>%
  dplyr::filter(!token %in% stopwords) 

Analyze tweets

Most mentioned topics

I first was able to draw a figure similar to Jason Baik’s one, with the
most common tokens. I too removed digits.

library("ggalt")
tokens %>%
  dplyr::mutate(token = stringr::str_remove_all(token, "[^\x01-\x7F]")) %>%
  dplyr::mutate(token = stringr::str_remove_all(token, "[[:digit:]]")) %>%
  dplyr::filter(! token %in% c("", "#rstats", "goals")) %>%
  dplyr::count(token, sort = TRUE) %>%
  dplyr::mutate(token = reorder(token, n)) %>%
  head(n = 18) %>%
  ggplot()  +
  geom_lollipop(aes(token, n),
                size = 1.5, col = "salmon") +
  hrbrthemes::theme_ipsum(base_size = 12,
                          axis_title_size = 12) +
  coord_flip()

Most common tokens in 2019 R goals tweets

What actions?

In this figure I identify verbs like learn, finish, write, build
and contribute. Let me look at a sample of lines for each of them.
This is a sample of lines for a small sample of verbs.

lines <- tweets %>%
  dplyr::select(text) %>%
  tidytext::unnest_tokens(line, text,
                        token = "lines")

sample_verb <- function(verb, lines){
  set.seed(42)
  dplyr::filter(lines, stringr::str_detect(line, paste0(verb, " "))) %>%
    dplyr::sample_n(3)
}

samples <- purrr::map_df(c("learn", "finish", "write", "build", "contribute"), sample_verb, lines)

knitr::kable(samples)
line
3️⃣ learn how to make r packages and write my code so it could be made into an r package more easily
1. learn how to do spatial analysis in r
2️⃣ learn better way to automate feature engineering (neural nets) for text
1️⃣ finally finish all the courses and certifications i started last year on #coursera and #datacamp
2️⃣ finish my track and field r package
3). finish that text mining project i started in october
4️⃣ write an advanced shiny book with bookdown 🎈
3️⃣ learn how to make r packages and write my code so it could be made into an r package more easily
1️⃣ write the htmlwidgets book
– build my first #rstats package (aiming for 2 but 1 would be great :d)
2⃣ build a shiny web app to explore tx staar data
– use f(x) regularly & build own package. cease patching.
5⃣ contribute to foss https://t.co/oh7mwcq50r
2 contribute more to #rstats community through #scicomm, #stackoverflow, etc
3) contribute to #swdchallenge (with r, duh)

These actions are quite varied, e.g. writing is applied to software as
well as reading material. My goal was to summarize tweets, but I keep
thinking reading all of them is interesting!

Packages?

I wondered how many of the tokens correspond to a package name. I
limited myself to CRAN packages, by using the available.packages()
function, but one could have a look at the source code of the
available package
to get
an idea of how to find names of packages from Bioconductor and GitHub.

cran_pkgs <- as.character(
  available.packages(contrib.url('https://cran.r-project.org', 'source'))[,"Package"])
pkg_tokens <- dplyr::mutate(tokens,
                            token = gsub("#", "", token)) %>%
  dplyr::filter(token %in% cran_pkgs)

Using the data I’ll look at tweets with the most packages, and most
frequent packages.

pkg_tokens %>%
  dplyr::group_by(text) %>%
  dplyr::mutate(pkg_text = paste(toString(token), text)) %>%
  dplyr::count(pkg_text, sort = TRUE) %>%
  head(n = 3) %>%
  dplyr::pull(pkg_text)
## [1] "portfolio, blogdown, rmarkdown, knitr, shiny, maps I really like seeing all these #rstats 2019 goals. My own, in order of urgency:\n1) Finish my personal website and online portfolio using blogdown\n2) Get rolling with project workflows, rmarkdown, and knitr \n3) Create shiny apps for custom interactive maps"                                           
## [2] "inference, projects, import, rvest, httr, xml2 My #rstats 2019 goals:\n1. Improve my statistical modeling and inference skills\n2. Develop business literacy and apply it in data analysis projects\n3. Continue to post on my blog (1 post every 2 months)\n4. Learn to import data using DBI, rvest, httr, and xml2"                                           
## [3] "shiny, templates, shiny, shiny, bookdown, shiny My #RStats goals for 2019: \n\n1 Improve shinydashboardPlus, bs4Dash and argonDash .. \n\n2 Release new shiny templates              \n3 Open a consulting service for https://t.co/k3PAbxyVMa about shiny \n4 Write an advanced shiny book with bookdown \n#rstats #shiny #consulting https://t.co/Fyc7MhaeW8"

There are false positives, e.g. projects was here meant as a word, not
a package name. What about the most popular packages among the tweets?

dplyr::count(pkg_tokens, token, sort = TRUE)
## # A tibble: 66 x 2
##    token         n
##         
##  1 shiny        16
##  2 blogdown      9
##  3 projects      9
##  4 rmarkdown     9
##  5 tidyverse     6
##  6 bookdown      5
##  7 purrr         4
##  8 track         4
##  9 caret         3
## 10 markdown      3
## # ... with 56 more rows

In this table, we get a glimpse at current popular packages, apart from
“projects”, “track” and “markdown”. If I’m reading the list correctly
they’re all developed at RStudio!

Conclusion

In this post I followed an approach similar to Jason
Baik’s

to summarize tweets about 2019 R goals announced on Twitter: I collected
tweets with rtweet and then used tidytext and the tidyverse to
summarize them. Goals often included learning about stuff, building
packages (find my list of
resources
and don’t miss
this offer by Steph de
Silva
),
and mentions of RStudio packages.

What about my own R goals, that I haven’t tweeted? I have not made any
list, but have exciting projects at work, and hope to keep
semi-consistently posting on this blog. In January I’ll also get to
start 2019 by giving two R talks, one at R-Ladies
Paris
and a
remote one at ConectaR 2019!
Happy 2019, I hope you can meet your own R goals!

To leave a comment for the author, please follow the link and comment on their blog: Posts on Maëlle's R blog.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)