rtoot: Collecting and Analyzing Mastodon Data

[This article was first published on schochastics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

It has been a wild view days on Twitter after Elon Musk took over. The future of the platform is unclear and many users are looking for alternatives, a popular one being mastodon. I also decided to give it a try and signed up. I quite quickly became interested in its API and realized that there is only a seemingly unmaintained R package on github. So I decided to write a new one. Fast forward a week(!!!!) and the package rtoot was accepted by CRAN. In this post I will introduce some of the functionality of the package and a roadmap for the future. (The name of the package derives from “toot”, the equivalent of a “tweet”)

# developer version
remotes::install_github("schochastics/rtoot")

# CRAN version
install.packages("rtoot")
library(rtoot)

Authenticate

Before doing anything you should setup credentials. Once setup, you will not need to bother with that anymore (hopefully). There is a vignette in the package (vignette("auth")) which explains the process. In brief, Mastodon has three types of API calls: anonymous, public, and user based. For anonymous calls you do not need any token. A public token can be obtained without an account and gives a few more API call options. A user based grants access to all endpoints but requires an account.

Running the function auth_setup() will guide you through a process of setting up a token.

auth_setup()

Instances

In contrast to twitter, mastodon is not a single instance, but a federation of different servers. You sign up at a specific server (say “mastodon.social”) but can still communicate with others from other servers (say “fosstodon.org”). The existence of different instances makes API calls more complex. For example, some calls can only be made within your own instance (e.g get_timeline_home()), others can access all instances but you need to specify the instance as a parameter (e.g. get_timeline_public()).

A list of active instances can be obtained with get_fedi_instances(). The results are sorted by number of users.

General information about an instance can be obtained with get_instance_general()

str(get_instance_general(instance = "mastodon.social"))
## List of 16
##  $ uri              : chr "mastodon.social"
##  $ title            : chr "Mastodon"
##  $ short_description: chr "The original server operated by the Mastodon gGmbH non-profit"
##  $ description      : chr ""
##  $ email            : chr "[email protected]"
##  $ version          : chr "4.0.0rc1"
##  $ urls             :List of 1
##   ..$ streaming_api: chr "wss://mastodon.social"
##  $ stats            :List of 3
##   ..$ user_count  : int 831723
##   ..$ status_count: int 41091494
##   ..$ domain_count: int 30169
##  $ thumbnail        : chr "https://files.mastodon.social/site_uploads/files/000/000/001/@1x/57c12f441d083cde.png"
##  $ languages        :List of 1
##   ..$ : chr "en"
##  $ registrations    : logi FALSE
##  $ approval_required: logi FALSE
##  $ invites_enabled  : logi TRUE
##  $ configuration    :List of 4
##   ..$ accounts         :List of 1
##   .. ..$ max_featured_tags: int 10
##   ..$ statuses         :List of 3
##   .. ..$ max_characters             : int 500
##   .. ..$ max_media_attachments      : int 4
##   .. ..$ characters_reserved_per_url: int 23
##   ..$ media_attachments:List of 6
##   .. ..$ supported_mime_types  :List of 28
##   .. .. ..$ : chr "image/jpeg"
##   .. .. ..$ : chr "image/png"
##   .. .. ..$ : chr "image/gif"
##   .. .. ..$ : chr "image/heic"
##   .. .. ..$ : chr "image/heif"
##   .. .. ..$ : chr "image/webp"
##   .. .. ..$ : chr "image/avif"
##   .. .. ..$ : chr "video/webm"
##   .. .. ..$ : chr "video/mp4"
##   .. .. ..$ : chr "video/quicktime"
##   .. .. ..$ : chr "video/ogg"
##   .. .. ..$ : chr "audio/wave"
##   .. .. ..$ : chr "audio/wav"
##   .. .. ..$ : chr "audio/x-wav"
##   .. .. ..$ : chr "audio/x-pn-wave"
##   .. .. ..$ : chr "audio/vnd.wave"
##   .. .. ..$ : chr "audio/ogg"
##   .. .. ..$ : chr "audio/vorbis"
##   .. .. ..$ : chr "audio/mpeg"
##   .. .. ..$ : chr "audio/mp3"
##   .. .. ..$ : chr "audio/webm"
##   .. .. ..$ : chr "audio/flac"
##   .. .. ..$ : chr "audio/aac"
##   .. .. ..$ : chr "audio/m4a"
##   .. .. ..$ : chr "audio/x-m4a"
##   .. .. ..$ : chr "audio/mp4"
##   .. .. ..$ : chr "audio/3gpp"
##   .. .. ..$ : chr "video/x-ms-asf"
##   .. ..$ image_size_limit      : int 10485760
##   .. ..$ image_matrix_limit    : int 16777216
##   .. ..$ video_size_limit      : int 41943040
##   .. ..$ video_frame_rate_limit: int 60
##   .. ..$ video_matrix_limit    : int 2304000
##   ..$ polls            :List of 4
##   .. ..$ max_options              : int 4
##   .. ..$ max_characters_per_option: int 50
##   .. ..$ min_expiration           : int 300
##   .. ..$ max_expiration           : int 2629746
##  $ contact_account  :List of 22
##   ..$ id             : chr "1"
##   ..$ username       : chr "Gargron"
##   ..$ acct           : chr "Gargron"
##   ..$ display_name   : chr "Eugen 💀"
##   ..$ locked         : logi FALSE
##   ..$ bot            : logi FALSE
##   ..$ discoverable   : logi TRUE
##   ..$ group          : logi FALSE
##   ..$ created_at     : chr "2016-03-16T00:00:00.000Z"
##   ..$ note           : chr "<p>Founder, CEO and lead developer <span class=\"h-card\"><a href=\"https://mastodon.social/@Mastodon\" class=\"| __truncated__
##   ..$ url            : chr "https://mastodon.social/@Gargron"
##   ..$ avatar         : chr "https://files.mastodon.social/accounts/avatars/000/000/001/original/dc4286ceb8fab734.jpg"
##   ..$ avatar_static  : chr "https://files.mastodon.social/accounts/avatars/000/000/001/original/dc4286ceb8fab734.jpg"
##   ..$ header         : chr "https://files.mastodon.social/accounts/headers/000/000/001/original/3b91c9965d00888b.jpeg"
##   ..$ header_static  : chr "https://files.mastodon.social/accounts/headers/000/000/001/original/3b91c9965d00888b.jpeg"
##   ..$ followers_count: int 195985
##   ..$ following_count: int 317
##   ..$ statuses_count : int 72663
##   ..$ last_status_at : chr "2022-11-11"
##   ..$ noindex        : logi FALSE
##   ..$ emojis         : list()
##   ..$ fields         :List of 1
##   .. ..$ :List of 3
##   .. .. ..$ name       : chr "Patreon"
##   .. .. ..$ value      : chr "<a href=\"https://www.patreon.com/mastodon\" target=\"_blank\" rel=\"nofollow noopener noreferrer me\"><span cl"| __truncated__
##   .. .. ..$ verified_at: NULL
##  $ rules            :List of 6
##   ..$ :List of 2
##   .. ..$ id  : chr "1"
##   .. ..$ text: chr "Sexually explicit or violent media must be marked as sensitive when posting"
##   ..$ :List of 2
##   .. ..$ id  : chr "2"
##   .. ..$ text: chr "No racism, sexism, homophobia, transphobia, xenophobia, or casteism"
##   ..$ :List of 2
##   .. ..$ id  : chr "3"
##   .. ..$ text: chr "No incitement of violence or promotion of violent ideologies"
##   ..$ :List of 2
##   .. ..$ id  : chr "4"
##   .. ..$ text: chr "No harassment, dogpiling or doxxing of other users"
##   ..$ :List of 2
##   .. ..$ id  : chr "5"
##   .. ..$ text: chr "No content illegal in Germany"
##   ..$ :List of 2
##   .. ..$ id  : chr "7"
##   .. ..$ text: chr "Do not share intentionally false or misleading information"
##  - attr(*, "headers")= tibble [1 × 3] (S3: tbl_df/tbl/data.frame)
##   ..$ rate_limit    : chr "300"
##   ..$ rate_remaining: chr "299"
##   ..$ rate_reset    : POSIXlt[1:1], format: "2022-11-11 16:20:00"

get_instance_activity() shows the activity for the last three months and get_instance_trends() the trending hashtags of the week.

get_instance_activity(instance = "fosstodon.org")
## # A tibble: 12 × 4
##    week                statuses logins registrations
##    <dttm>                 <int>  <int>         <int>
##  1 2022-11-10 21:47:00    13647   7623           691
##  2 2022-11-03 21:47:00    23227  11913          3401
##  3 2022-10-27 21:47:00        0      0             0
##  4 2022-10-20 21:47:00        0      0             0
##  5 2022-10-13 21:47:00        0      0             0
##  6 2022-10-06 21:47:00        0      0             0
##  7 2022-09-29 21:47:00        0      0             0
##  8 2022-09-22 21:47:00        0      0             0
##  9 2022-09-15 21:47:00        0      0             0
## 10 2022-09-08 21:47:00        0      0             0
## 11 2022-09-01 21:47:00        0      0             0
## 12 2022-08-25 21:47:00        0      0             0
get_instance_trends(instance = "fosstodon.org")
## # A tibble: 70 × 5
##    name             url                                 day        accou…¹  uses
##    <chr>            <chr>                               <date>       <int> <int>
##  1 followbackfriday https://fosstodon.org/tags/followb… 2022-11-11     175   246
##  2 followbackfriday https://fosstodon.org/tags/followb… 2022-11-10       3     3
##  3 followbackfriday https://fosstodon.org/tags/followb… 2022-11-09       2     2
##  4 followbackfriday https://fosstodon.org/tags/followb… 2022-11-08       1     1
##  5 followbackfriday https://fosstodon.org/tags/followb… 2022-11-07       0     0
##  6 followbackfriday https://fosstodon.org/tags/followb… 2022-11-06       0     0
##  7 followbackfriday https://fosstodon.org/tags/followb… 2022-11-05       0     0
##  8 followfriday     https://fosstodon.org/tags/followf… 2022-11-11     246   352
##  9 followfriday     https://fosstodon.org/tags/followf… 2022-11-10      26    30
## 10 followfriday     https://fosstodon.org/tags/followf… 2022-11-09      12    31
## # … with 60 more rows, and abbreviated variable name ¹​accounts

Get toots

To get the most recent toots of a specific instance use get_timeline_public()

get_timeline_public(instance = "mastodon.social")
##    id        uri   created_at          content visib…¹ sensi…² spoil…³ reblo…⁴ favou…⁵ repli…⁶
##    <chr>     <chr> <dttm>              <chr>   <chr>   <lgl>   <chr>     <int>   <int>   <int>
##  1 10931614… http… 2022-11-09 22:12:13 "<p>Vi… public  FALSE   ""            0       0       0
##  2 10931614… http… 2022-11-09 22:04:24 "<p>I … public  FALSE   ""            0       0       0
##  3 10931614… http… 2022-11-09 21:46:36 "<p>Ha… public  FALSE   ""            0       0       0
##  4 10931614… http… 2022-11-09 22:12:11 "<p>To… public  FALSE   ""            0       0       0
##  5 10931614… http… 2022-11-09 22:12:05 "<p>:s… public  FALSE   ""            0       0       0
##  6 10931614… http… 2022-11-09 22:12:05 "<p>We… public  FALSE   ""            0       0       0
##  7 10931614… http… 2022-11-09 22:12:09 "<p>He… public  FALSE   ""            0       0       0
##  8 10931614… http… 2022-11-09 22:12:09 "<p>Et… public  FALSE   ""            0       0       0
##  9 10931614… http… 2022-11-09 22:12:08 "<p>Af… public  FALSE   ""            0       0       0
## 10 10931614… http… 2022-11-09 22:04:19 "<p>I'… public  FALSE   ""            0       0       0
## 11 10931614… http… 2022-11-09 22:12:05 "<p>\"… public  FALSE   ""            0       0       0
## 12 10931614… http… 2022-11-09 22:12:06 "<p>Wh… public  FALSE   ""            0       0       0
## 13 10931614… http… 2022-11-09 22:12:05 "<p>Ev… public  FALSE   ""            0       0       0
## 14 10931614… http… 2022-11-09 22:12:04 "<p>\"… public  FALSE   ""            0       0       0
## 15 10931614… http… 2022-11-09 22:12:00 "<p>Wh… public  FALSE   ""            0       0       0
## 16 10931614… http… 2022-11-09 22:11:13 "<p>Lo… public  FALSE   ""            0       0       0
## 17 10931614… http… 2022-11-09 22:12:04 "<p>Ne… public  FALSE   ""            0       0       0
## 18 10931614… http… 2022-11-09 22:12:02 "<p>Th… public  FALSE   ""            0       0       0
## 19 10931614… http… 2022-11-09 22:11:50 "<p>So… public  FALSE   ""            0       0       0
## 20 10931614… http… 2022-11-09 22:12:01 "<p>Th… public  FALSE   ""            0       0       0
## # … with 19 more variables: url <chr>, in_reply_to_id <chr>, in_reply_to_account_id <chr>,
## #   language <chr>, text <lgl>, application <I<list>>, poll <I<list>>, card <I<list>>,
## #   account <list>, reblog <I<list>>, media_attachments <I<list>>, mentions <I<list>>,
## #   tags <I<list>>, emojis <I<list>>, favourited <lgl>, reblogged <lgl>, muted <lgl>,
## #   bookmarked <lgl>, pinned <lgl>, and abbreviated variable names ¹​visibility, ²​sensitive,
## #   ³​spoiler_text, ⁴​reblogs_count, ⁵​favourites_count, ⁶​replies_count
## # ℹ Use `colnames()` to see all variable names

To get the most recent toots containing a specific hashtag use get_timeline_hashtag()

get_timeline_hashtag(hashtag = "rstats", instance = "fosstodon.org")
## # A tibble: 20 × 29
##    id          uri   created_at          content visib…¹ sensi…² spoil…³ reblo…⁴
##    <chr>       <chr> <dttm>              <chr>   <chr>   <lgl>   <chr>     <int>
##  1 1093260576… http… 2022-11-11 16:12:55 "<p>Re… public  FALSE   ""            1
##  2 1093260140… http… 2022-11-11 16:02:20 "<p>Lo… public  FALSE   ""            0
##  3 1093260050… http… 2022-11-11 15:59:56 "<p><a… public  FALSE   ""            0
##  4 1093259862… http… 2022-11-11 15:56:03 "<p>I … public  FALSE   ""            3
##  5 1093259083… http… 2022-11-11 15:35:34 "<p>Pe… public  FALSE   ""            0
##  6 1093259018… http… 2022-11-11 15:34:06 "<p>I'… public  FALSE   ""            1
##  7 1093258952… http… 2022-11-11 15:32:55 "<p>Wh… public  FALSE   ""            0
##  8 1093258902… http… 2022-11-11 15:31:37 "<p>Cu… public  FALSE   ""            4
##  9 1093258386… http… 2022-11-11 15:18:31 "<p>Is… public  FALSE   ""            0
## 10 1093258337… http… 2022-11-11 15:17:16 "<p><a… public  FALSE   ""            0
## 11 1093258243… http… 2022-11-11 15:14:52 "<p>Th… public  FALSE   ""            4
## 12 1093258124… http… 2022-11-11 15:11:51 "<p>It… public  TRUE    ""            0
## 13 1093257660… http… 2022-11-11 15:00:02 "<p>If… public  FALSE   ""            1
## 14 1093257302… http… 2022-11-11 14:50:48 "<p>Cr… public  FALSE   ""            0
## 15 1093257130… http… 2022-11-11 14:46:34 "<p>2/… public  FALSE   ""            4
## 16 1093257094… http… 2022-11-11 14:45:39 "<p>1/… public  FALSE   ""           25
## 17 1093257067… http… 2022-11-11 14:20:41 "<p>Fo… public  TRUE    "Decis…       0
## 18 1093256660… http… 2022-11-11 14:34:34 "<p>Tr… public  FALSE   ""            2
## 19 1093256557… http… 2022-11-11 14:31:59 "<p>He… public  FALSE   ""            1
## 20 1093256340… http… 2022-11-11 14:26:28 "<p>I … public  FALSE   ""            0
## # … with 21 more variables: favourites_count <int>, replies_count <int>,
## #   url <chr>, in_reply_to_id <chr>, in_reply_to_account_id <chr>,
## #   language <chr>, text <lgl>, application <I<list>>, poll <I<list>>,
## #   card <I<list>>, account <list>, reblog <I<list>>,
## #   media_attachments <I<list>>, mentions <I<list>>, tags <list>,
## #   emojis <I<list>>, favourited <lgl>, reblogged <lgl>, muted <lgl>,
## #   bookmarked <lgl>, pinned <lgl>, and abbreviated variable names …

The function get_timeline_home() allows you to get the most recent toots from your own timeline.

get_timeline_home()

Get accounts

rtoot exposes several account level endpoints. Most require the account id instead of the username as an input. There is, to our knowledge, no straightforward way of obtaining the account id. With the package you can get the id via search_accounts().

search_accounts("schochastics")
## # A tibble: 2 × 21
##   id        usern…¹ acct  displ…² locked bot   disco…³ group created_at         
##   <chr>     <chr>   <chr> <chr>   <lgl>  <lgl> <lgl>   <lgl> <dttm>             
## 1 10930243… schoch… scho… David … FALSE  FALSE FALSE   FALSE 2022-11-07 00:00:00
## 2 10926171… schoch… scho… David … FALSE  FALSE FALSE   FALSE 2022-10-30 00:00:00
## # … with 12 more variables: note <chr>, url <chr>, avatar <chr>,
## #   avatar_static <chr>, header <chr>, header_static <chr>,
## #   followers_count <int>, following_count <int>, statuses_count <int>,
## #   last_status_at <dttm>, fields <list>, emojis <I<list>>, and abbreviated
## #   variable names ¹​username, ²​display_name, ³​discoverable

(Future versions will allow to use the username and user id interchangeably)

Using the id, you can get the followers and following users with get_account_followers() and get_account_following() and statuses with get_account_statuses().

id <- "109302436954721982"
get_account_followers(id)
## # A tibble: 40 × 21
##    id       usern…¹ acct  displ…² locked bot   disco…³ group created_at         
##    <chr>    <chr>   <chr> <chr>   <lgl>  <lgl> <lgl>   <lgl> <dttm>             
##  1 1093231… christ… chri… "Chris… FALSE  FALSE FALSE   FALSE 2022-11-11 00:00:00
##  2 1093024… psanker psan… "Patri… FALSE  FALSE TRUE    FALSE 2022-11-07 00:00:00
##  3 1093161… JLattm… JLat… "Johan… FALSE  FALSE TRUE    FALSE 2022-11-07 00:00:00
##  4 1093058… matthi… matt… "Matt … FALSE  FALSE TRUE    FALSE 2022-10-22 00:00:00
##  5 1092438… l_biber l_bi… "Loren… FALSE  FALSE FALSE   FALSE 2022-10-28 00:00:00
##  6 1092560… gianlu… gian… "Gianl… FALSE  FALSE TRUE    FALSE 2022-10-28 00:00:00
##  7 1092876… ReeCee  ReeC… ""      FALSE  FALSE FALSE   FALSE 2022-11-04 00:00:00
##  8 1093136… abitter abit… "André… FALSE  FALSE TRUE    FALSE 2022-11-07 00:00:00
##  9 1092763… Andi    Andi… "Andi … TRUE   FALSE TRUE    FALSE 2022-11-02 00:00:00
## 10 1092657… MattCr… Matt… "Matt … FALSE  FALSE TRUE    FALSE 2022-11-01 00:00:00
## # … with 30 more rows, 12 more variables: note <chr>, url <chr>, avatar <chr>,
## #   avatar_static <chr>, header <chr>, header_static <chr>,
## #   followers_count <int>, following_count <int>, statuses_count <int>,
## #   last_status_at <dttm>, fields <I<list>>, emojis <I<list>>, and abbreviated
## #   variable names ¹​username, ²​display_name, ³​discoverable
get_account_following(id)
## # A tibble: 40 × 21
##    id       usern…¹ acct  displ…² locked bot   disco…³ group created_at         
##    <chr>    <chr>   <chr> <chr>   <lgl>  <lgl> <lgl>   <lgl> <dttm>             
##  1 1092657… MattCr… Matt… Matt C… FALSE  FALSE TRUE    FALSE 2022-11-01 00:00:00
##  2 1092630… ramikr… rami… Rami K… FALSE  FALSE FALSE   FALSE 2022-10-31 00:00:00
##  3 1093241… Luk_O   Luk_… Lukas … FALSE  FALSE FALSE   FALSE 2022-11-10 00:00:00
##  4 1093238… cosima… cosi… Cosima… FALSE  FALSE FALSE   FALSE 2022-11-11 00:00:00
##  5 1092094… alexpg… alex… alex h… FALSE  FALSE TRUE    FALSE 2022-10-21 00:00:00
##  6 1093183… Johann… Joha… Johann… FALSE  FALSE TRUE    FALSE 2022-11-09 00:00:00
##  7 1092535… ropens… rope… rOpenS… FALSE  FALSE TRUE    FALSE 2022-10-29 00:00:00
##  8 1093134… crimep… crim… Emma B… FALSE  FALSE TRUE    FALSE 2022-11-05 00:00:00
##  9 1093134… gaborc… gabo… Gabor … FALSE  FALSE TRUE    FALSE 2022-11-09 00:00:00
## 10 1093111… sachae… sach… Sacha … FALSE  FALSE TRUE    FALSE 2022-11-08 00:00:00
## # … with 30 more rows, 12 more variables: note <chr>, url <chr>, avatar <chr>,
## #   avatar_static <chr>, header <chr>, header_static <chr>,
## #   followers_count <int>, following_count <int>, statuses_count <int>,
## #   last_status_at <dttm>, fields <I<list>>, emojis <I<list>>, and abbreviated
## #   variable names ¹​username, ²​display_name, ³​discoverable
get_account_statuses(id)
## # A tibble: 8 × 29
##   id           uri   created_at          content visib…¹ sensi…² spoil…³ reblo…⁴
##   <chr>        <chr> <dttm>              <chr>   <chr>   <lgl>   <chr>     <int>
## 1 10932547240… http… 2022-11-11 13:45:22 "<p><s… public  FALSE   ""            1
## 2 10932521809… http… 2022-11-11 12:40:42 "<p><s… public  FALSE   ""            0
## 3 10932424625… http… 2022-11-11 08:33:33 "<p><s… public  FALSE   ""            0
## 4 10931062119… http… 2022-11-08 22:48:31 "<p><s… public  FALSE   ""            0
## 5 10930365326… http… 2022-11-07 17:16:28 "<p><s… public  FALSE   ""            0
## 6 10930261553… http… 2022-11-07 12:52:34 "<p>He… public  FALSE   ""            0
## 7 10930256528… http… 2022-11-07 12:39:47 "<p><s… public  FALSE   ""            0
## 8 10930253167… http… 2022-11-07 12:31:15 "<p>Hi… public  FALSE   ""           14
## # … with 21 more variables: favourites_count <int>, replies_count <int>,
## #   url <chr>, in_reply_to_id <chr>, in_reply_to_account_id <chr>,
## #   language <chr>, text <lgl>, application <I<list>>, poll <I<list>>,
## #   card <I<list>>, account <list>, reblog <I<list>>,
## #   media_attachments <I<list>>, mentions <I<list>>, tags <I<list>>,
## #   emojis <I<list>>, favourited <lgl>, reblogged <lgl>, muted <lgl>,
## #   bookmarked <lgl>, pinned <lgl>, and abbreviated variable names …

Posting statuses

You can post toots with:

post_toot(status = "my first rtoot #rstats")

It can also include media and alt_text.

post_toot(status = "my first rtoot #rstats", media="path/to/media", 
          alt_text = "description of media")

You can mark the toot as sensitive by setting sensitive = TRUE and add a spoiler text with spoiler_text.

Thanks!

This package wouldn’t have been possible without my coauthor @chainsawriot who contributed a huge chunk of code, especially all unit tests! Also thanks to @JBGruber, who contributed to the authentication routines, and @urswilke for some fixes.

To leave a comment for the author, please follow the link and comment on their blog: schochastics.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)