Mr. Mastodon Farm: analysing a mastodon ActivityPub outbox.json file

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I migrated my personal Mastodon account from mastodon.social to biologists.social recently. If you’d like to do the same, I found this guide very useful. Note that, once you move, all your previous posts are left behind on the old instance.

Before I migrated, I downloaded all of my data from the old instance. I thought I’d take a look at what I had posted to see if anything was worth reposting on biologists.social. I also took a look at my first 9 months on Mastodon at mastodon.social. To do all of this, I used R.

The previous content is still visible at the old instance, unless you delete that account. However, divining your original content from the boosts is hard. This is where R can help!

Getting started

When downloading your data from a Mastodon instance, you get a zip file which contains a bunch of files, e.g. your avatar, header etc. as well as a file called outbox.json – this is the file we’ll use.

I set up an RStudio project with standardised structure (using this) and copied the outbox.json file into the Data folder.

The code

I have annotated the code as we go. The first thing is to make a html file so that I can read the contents of my posts.

library(jsonlite)
library(tidyr)
library(dplyr)
library(timetk)
library(lubridate)
library(ggplot2)
# this script can be found here: https://quantixed.org/2022/12/26/twitcher-ii-tweet-frequency-and-top-tweets/
source("Script/calendarHeatmap.R")

# load only the "orderedItems" part of outbox.json
df <- fromJSON("Data/outbox.json")[["orderedItems"]]

# we just need a list of times and text/content of posts
# here we filter for "raw posts" and not other outbox items
posts <- df %>%
  unnest_wider(object, names_sep = "_") %>% 
  filter(type == "Create") %>% 
  filter(!is.na(object_content)) %>% 
  filter(is.na(object_inReplyTo)) %>% 
  select(c(published,object_content))

# a quick kludge to make something that will display in html
output <- as.vector(paste0("<p><b>",posts$published,"</p></b>",posts$object_content,"<hr>"))

# write the file
fileConn <- file("Output/Data/output.html")
writeLines(output, fileConn)
close(fileConn)

Now, in the directory Output/Data/ we have a little output.html file that can be opened in a browser.

It contains a list of time/dates of each toot and the text content. Note the aim here was just for me to be able to read the content easily. There are other projects out there to make a fully functional repository of Mastodon content.

All the links are live and apart from some borked special characters, the html is very readable.

My first toot was:

2022-10-28T11:18:29Z

So I made an account here and now I’m not sure what to do. I also missed the opportunity to ditch my biogeek handle from the bird site. Anyway, this is my first post.

and my last toot on mastodon.social was:

2023-06-23T13:39:28Z

Therapy dogs are so last year. Free ice cream and llamas to pet today at #WarwickUni

The photos are not displayed since we only extracted two columns (date and post content from the json file).

In my archive, there were 1192 items of content, of which only 206 were original toots. I found a couple of things that might be fun to repost, and a bunch of things I’d forgotten about. So, mission accomplished!

Post frequency

Finally, I had a look at posting frequency.

# transform the created date/time to POSIX
posts$created <- as.POSIXct(posts$published, format="%Y-%m-%dT%H:%M:%SZ")
# summarise the posts
df_day <- posts %>%
  summarize_by_time(.date_var = created,
                    .by = "day",
                    n = n())
# generate the plot using calendarHeatmap function
p <- calendarHeatmap(as.Date(df_day$created), df_day$n, title = "Toots @mastodon.social", subtitle = social_caption)
# add social media icon, see https://nrennie.rbind.io/blog/adding-social-media-icons-ggplot2/
sysfonts::font_add(family = "Font Awesome 6 Brands",
                   regular = "/Users/steve/Library/Fonts/Font Awesome 6 Brands-Regular-400.otf")
showtext::showtext_auto()
social_caption <- paste0(
  "<span style='font-family:\"Font Awesome 6 Brands\";'></span>
  <span style='color: #3b528b'>@clathrin</span>"
)
p <- p +
  theme(plot.subtitle = ggtext::element_textbox_simple())
# save the plot
ggsave("Output/Plots/all_calendar.png", p, width = 4, height = 4)

which resulted in this graphic:

Note that the code above uses the calendarHeatmap function from here, and the method for adding social media icons to ggplots is described by Nicola Rennie here.

OK, so 9 months is not a lot of data compared to my 12 years on Twitter, but there are a couple of insights. I am using Mastodon everyday, but here we are looking at when I post – not when I reply or anything else – just post. From this data, it looks like I was settling into a pattern of posting Monday-Friday and not on the weekend. My maximum number of toots in a day was 8, which coincided with a trip to HHMI Janelia for the Recognizing Preprint Peer Review meeting.

It will be interesting to have look at my data from biologists.social after a few months to see how my Mastodon usage is developing.

The post title comes from “Mr. Mastodon Farm” by Cake from their Motorcade of Generosity album.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)