Please consider this a “supplementary analysis” to my previous post looking at the frequency of tweets from my personal account over the last 12 years.
I was curious about what times I was active on Twitter (measured by when I tweeted). Others might be interested in a solution to look at this in R.
As in the previous post, we need to get the data into R and then make sure we have a date object to work with. The data comes from the
tweets.js file that comes as part of the Twitter data download, when you request it. The code below assumes it is in a directory called
Data in the
library(jsonlite) library(lubridate) library(ggplot2) library(dplyr) library(timetk) json_file <- "Data/tweets.js" json_data <- fromJSON(txt = json_file, flatten = TRUE) # make date/time column json_data$tweet_created_at <- as.POSIXct(json_data$tweet.created_at, format="%a %b %d %H:%M:%S %z %Y") df_hour <- json_data %>% summarize_by_time(.date_var = tweet_created_at, .by = "hour", hh = hour(tweet_created_at), dd = weekdays(tweet_created_at), yy = year(tweet_created_at)) %>% group_by(yy, dd, hh) %>% summarize(nn = n()) p4 <- ggplot(df_hour, aes(x = hh, y = nn)) + geom_col() + theme_bw() + lims(x = c(0,24)) + labs(x = "Hour", y = "Tweets") + facet_grid(yy ~ factor(dd, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))) ggsave("Output/Plots/tweet_timeOfDay.png", p4)
Like last time the magic comes from the
This function allows us to collapse the tweets by hour. We then group and summarise them in order to generate the plots. We can use
facet_grid to lay them out. Note that, the days (
dd) will be in alphabetical order unless you tell
ggplot how to level each day as a factor.
As in the previous post, it’s clear how my tweeting declined from a high in 2015 and 2016. You can also see that I didn’t tweet as much at the weekend as during the week. But what about the times?
Pretty much I tweeted from 6 am through to 10 pm each day. Basically, those are my waking hours. Hmmm, not very healthy.
I had expected to see some trends, i.e. tweeting more first thing than later, or maybe tweeting more around lunchtime. But the distributions are pretty flat or at least show no consistent patterns.
We can drill down a little more and look at tweeting times per day of the week for 2015 when I was tweeting the most. In this year there is a slight trend on Tuesdays-Fridays to increasingly tweet up to 10am and then tail off. Let’s break this year down further by month.
df_2015_hour <- json_data %>% summarize_by_time(.date_var = tweet_created_at, .by = "hour", hh = hour(tweet_created_at), mm = month(tweet_created_at, label = TRUE), dd = weekdays(tweet_created_at), yy = year(tweet_created_at)) %>% filter(yy == 2015) %>% group_by(mm, dd, hh) %>% summarize(nn = n()) p5 <- ggplot(df_2015_hour, aes(x = hh, y = nn, group = mm)) + geom_col() + theme_bw() + lims(x = c(0,24)) + labs(x = "Hour", y = "Tweets") + facet_grid(mm ~ factor(dd, levels=c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))) ggsave("Output/Plots/tweet_2015timeOfDay.png", p5)
While it’s possible to drill down this far, the data gets noisy with only 4 or 5 days per month to collate for the distributions.
This post was more to show how to interrogate a dataset. The wonderful thing about R and the libraries used here, is how easy it is to quickly spin up some plots to explore a dataset. We didn’t get much insight beyond the fact that I used Twitter far too much!
The post title comes from “Any Time At All” by The Beatles from their “A Hard Day’s Night” album.