Y’all, a few weeks ago I came across the patchwork package created by Thomas Lin Pedersen. When I saw how easy it was to mix and match multiple R plots into one image, I gave it a quick share with some basic highlighting of how to use the package. I knew it was a cool package but I had no idea how excited people would get! For a relatively niche subject on twitter, this tweet got a lot of traction. The tweet interaction was not just with R users! It also had a lot of python users calling for the same functionality with matplotlib.
By the time the hype was over, it reached 6.9K likes and 1.8K retweets. While this volume might not be traditionally viewed as viral on twitter, I would certainly say that this tweet has gone “nerd viral”. It was definitely shocking to me.
To pay homage to the tweet and the package, I’ve decided to conduct a little analysis on the tweet performance and possible impact on package downloads. I’m going to get so meta with this analysis, that I’ll then arrange the resulting plots with patchwork.
Bonus: Adding an image to patchwork
As a bonus, I’ll show y’all how I added an image to the patchwork layout by placing it within a ggplot graph and fixing the coordinates to avoid weird scaling issues.
Install and Load the Packages
Thank you to “Dusty” who posted the tip to install and load packages using “easypackages” on my last tutorial.
#install.packages("easypackages") library(easypackages) packages("tidyverse", "rtweet", "tidytext", "rtweet", "wordcloud2", "patchwork", "cran.stats", "data.table", "gameofthrones", "ggimage", "magick", "ggpubr", "jpeg", "png")
Set up our colour palette
#Set the palette using the beautiful GOT Arya palette from Alejandro Jiménez pal <- got(20, option = "Arya") #cherry pick a few extended c <-"#889999" c2 <- "#AAB7AF"
Add your twitter credentials
#create_token( # app = "ADD YOUR CREDS", # consumer_key = "ADD YOUR CREDS", # consumer_secret = "ADD YOUR CREDS")
1st Plot – Create a plot of the tweet stats (favorites, retweets)
Lookup the tweet and view stats
lt <-lookup_tweets('1229176433123168256') lt
Create a chart with the tweet stats
p1 <- lt %>% rename(Faves = favorite_count, RTs = retweet_count) %>% select(Faves, RTs) %>% #select only the desired columns gather("stat", "value") %>% #reformat to make the table long which is easier for bar charts to consume ggplot(aes(stat, value)) + #plot the bar chart geom_bar(stat="identity", fill=c2) + theme_classic() + labs(title = "Tweet Stats", x = "Tweet Statistic", y = "Total") p1
2nd Plot – Create a plot of the tweet stats (favorites, retweets)
Gather approx 1K of the retweet data
The get_retweets() function only allows a max of 100 retweets to be pulled via the API at a time. This is a rate imposed by the twitter API. When pulling this data, I had quite a difficult time. Not only, did a lot of the suggested methods to getting cursors fail, the rate limiting wasn’t consistent. Sometimes I was able to get close to 1K tweets in 100 batches. Sometimes it blocked me for 15 min intervals (as expected). Since this is just an example to show patchwork, I decided to just grab 1K of the retweets which is roughly half of the full set of retweets. Further, I should let you know that I did attempt to put it in a function, but I couldn’t find an appropriate system wait time that would complete in a reasonable time and/or actually return the data. Please reach out if you have a better/proven method! In the meantime, here is my brute force method.
statusid <- '1229176433123168256' #set the first lowest retweet statusid to be the id of the original tweet rtweets <- get_retweets(statusid, n=100, parse = TRUE) # get 100 retweets min_id <- min(rtweets$status_id) rtweets2 <- get_retweets(statusid, n=100, max_id = min_id, parse = TRUE) # get 100 retweets min_id <- min(rtweets2$status_id)
And repeat as needed, full code is available here.
Graph the most common words used in the retweeters profile descriptions
data(stop_words) #Unnest the words - code via Tidy Text rtweet_table2 <- rtweet_table %>% unnest_tokens(word, description) %>% anti_join(stop_words) %>% count(word, sort = TRUE) %>% filter(!word %in% c('t.co', 'https'))
p2 <- rtweet_table2 %>% filter(n> 50) %>% mutate(word = reorder(word, n)) %>% ggplot(aes(word, n)) + theme_classic() + geom_col(fill= c) + labs(title = "RT Profiles", x = "Key Words", y = "Total Occurances") + coord_flip() p2
3rd Plot – Plot the patchwork CRAN download stats
Gather the data
To gather the patchwork download stats, I used the “cran.stats” package. The examples to process the download stats were very easy to follow and I used them as the basis for gathering the data. See examples here.
dt = read_logs(start = as.Date("2020-02-01"), end = as.Date("2020-02-29"), verbose = TRUE)
patchwork <- stats_logs(dt, type="daily", packages=c("patchwork"), dependency=TRUE, duration = 30L)
Plot the CRAN download data
I plotted the download data using the ggplot, the geom_line() function and just a little extra fanciness to annotate the graph with the annotate() function. Great annotation examples here
p3 <- ggplot(patchwork, aes(x=key, y=tot_N, group=1)) + geom_line() + theme_classic() + theme(axis.text.x = element_text(angle = 60, hjust = 1)) + ylim(0, 1500) + labs(title = "Downloads of the R Patchwork Package", x = "Date", y = "Total Downloads") + annotate("rect", xmin = "2020-02-16", xmax = "2020-02-20", ymin = 400, ymax = 900, alpha = .3, fill = c2) + annotate( geom = "curve", alpha = 0.3, x = "2020-02-14", y = 650, xend = "2020-02-17", yend = 800, curvature = .3, arrow = arrow(length = unit(2, "mm")) ) + annotate(geom = "text", x = "2020-02-07", y = 650, label = "Nerd viral #rstats tweet", hjust = "left", alpha = 0.5) p3
Add the plots to the same graphic using patchwork
As is the focus of this post, when this package was shared on twitter, people were very excited about it. The patchwork package was created by Thomas Lin Pedersen. Not only is it incredibly easy to use, it comes with great documentation
Try a few layouts
Using the plots p1, p2, p3 created above, try a few layouts following the package documentation
p1 + p2 + p3
p1/ (p2 +p3)
#Final Layout p <- p3 / (p1 + p2) p
Annotate the final layout
We will select the final layout from the above code block and then add some overall titles, captioning and formatting. This example was covered in the excellent patchwork annotation guide
p + plot_annotation( title = 'Patchwork Went Nerd Viral', caption = 'Source: @littlemissdata' ) & theme(text = element_text('mono'))
Add an image to the patchwork graphic
Bring in the image
Using an empty ggplot and the background_image() function, you can bring an image into a graph object. Further, you can prevent image resizing with the coord_fixed() function. This is important so the actual image doesn’t get resized with the patchwork placement.
twitter <- image_read('https://raw.githubusercontent.com/lgellis/MiscTutorial/master/Patchwork/twitter_post.png') twitter <- ggplot() + background_image(twitter) + coord_fixed()
Plot the image with patchwork
pF <- twitter + (p3/ (p1 + p2)) pF + plot_annotation( title = 'Patchwork Went Nerd Viral', caption = 'Source: @littlemissdata' )
Please comment below if you enjoyed this blog, have questions, or would like to see something different in the future. Note that the full code is available on my github repo.
If you have trouble downloading the files or cloning the repo from github, please go to the main page of the repo and select “Clone or Download” and then “Download Zip”. Alternatively or you can execute the following R commands to download the whole repo through R