Twitcher: tweet frequency over the years

[This article was first published on Rstats – quantixed, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

At the time of writing, I have essentially left Twitter. It was a fun ride and without going into what’s happening there now, this is a good opportunity to look at my 12 years on the platform.

Early in November, I downloaded my data and locked my Twitter account. This gave me all the data I needed. Using R, a few nifty libraries and the tweets.js file that was part of the download, I could gain quite a lot of insight. You can use this code to analyse your own data.

The code

First we need to make a data frame of all the tweets, and then do a bit of cleaning to look at dates properly.

library(jsonlite)
library(ggplot2)
library(dplyr)
library(timetk)
library(viridis)

# in my wd I have a directory called Data. A copy of tweets.js is placed in there
json_file <- "Data/tweets.js"
# load data frame of tweeets
json_data <- fromJSON(txt = json_file, flatten = TRUE)

# tweet.created_at column contains human readable date/time like
# Thu Nov 03 06:12:14 +0000 2022
# so let's make a date/time column
json_data$tweet_created_at <- as.POSIXct(json_data$tweet.created_at, format="%a %b %d %H:%M:%S %z %Y")

At this point we have a data frame of tweets, with the date and time of posting in the correct format. Now, all tweets are not equal. They can be:

  • original tweets
  • retweets of other people’s content or
  • they may be a reply

These three types of tweets are very different levels of engagement. I am ignoring quote tweets, because I didn’t use this feature much and I view it as similar to an original tweet.

So let’s classify them:

# make factor for Tweet, Reply, RT
json_data$tweet_type <- as.factor(ifelse(
  grepl("^RT ",json_data$tweet.full_text), "RT", ifelse(
    is.na(json_data$tweet.in_reply_to_user_id), "Tweet", "Reply")))

At this point I explored the data a bit, looking at my most popular day for tweeting and so on. I am skipping this in favour of two more interesting plots.

Let’s look at my tweets per month and how they breakdown in the three categories. Looking at the data per day or per week was far too noisy to spot trends. Note that the timetk library has a very useful set of functions like summarize_by_time() that work well for this.

df_month <- json_data %>% 
  group_by(tweet_type) %>%
  summarize_by_time(.date_var = tweet_created_at,
                    .by = "month",
                    n = n())

p2 <- ggplot(df_month, aes(x = tweet_created_at, y = n, group = tweet_type, fill = tweet_type)) +
  geom_col() +
  scale_fill_viridis(discrete = TRUE) +
  theme_bw() +
  labs(x = "Data", y = "Tweets")

ggsave("Output/Plots/tweets_by_time_months.png", p2)

I was surprised at how my tweeting had declined from a peak in 2015/2016. Before running this analysis, I felt that 2016 was a turning point for me in Twitter usage. Political events (EU Referendum, Trump election) that year seemed to turn Twitter from an enjoyable experience of talking about science and stuff, to doom scrolling. Still in mid 2022 I was managing over 50 tweets per month in total, which is almost 2 tweets per day. So I was far from inactive.

I think tweeting (of any type) is a good way of measuring engagement with the platform. I’m pretty sure I was spending just as much time on Twitter in later years as before, if anything I was spending far more on there. I just hadn’t realised how I had become more passive and less engaged.

Given that the total number of tweets changes over time, let’s have a look at how the three categories breakdown as a fraction of total tweets.

p3 <- df_month %>%
  group_by(tweet_created_at) %>%
  mutate(pc = n / sum(n)) %>%
  ggplot(aes(x = tweet_created_at, y = pc, group = tweet_type, fill = tweet_type)) +
  geom_area(alpha = 0.5 , size = 0.5, colour = "white") +
  scale_fill_viridis(discrete = TRUE) +
  lims(y = c(0,1)) +
  theme_bw() +
  labs(x = "Data", y = "Tweets")

ggsave("Output/Plots/tweet_proportion_by_time_months.png", p3)

In my first few years on Twitter, over half of my tweets were real content and this dwindled over time to be around 15-20% of my twitter activity. My RTing really took off around 2015, peaking in mid-2018 and then declined. The remainder – a substantial fraction – were replies. I swear I am not a “reply guy”, and I would need to drill down a bit further to see how these replies broke down. Many could be me threading tweets, replying to people who have replied to me, and not necessarily me replying to an original tweet by someone else.

It would be interesting to look at what could have driven these changes. I suspect it is mainly a change in my tweeting and engagement, but I guess it is possible that changes introduced by Twitter (e.g. 240 characters, UI changes) have influenced my engagement.

The post title comes from “Twitcher” by Scorn, the first track off the 1997 Zander album.

To leave a comment for the author, please follow the link and comment on their blog: Rstats – quantixed.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)