Growing old in Twitter

[This article was first published on R on Implicit None, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I started using Twitter more than 10 years ago (!). I open an account in this social network in 2008 and although I was not using it too much for the first year, I become a frequent user after that. It has helped me to get news, information both for my personal and professional interests. But not only that, Twitter has been also the data source for our research, that helped us to investigate the relationship between human behavior in the social platform and paramount problems in our society as information propagation, unemployment, disaster damage, political opinion. As we keep on working on those subjects we have also recently extended our research to other problems like health, or climate change

Most of the research in human behavior is constrained either by time or by population covered, so we can’t have both. Large longitudinal databases, extending tens of years, are relatively small in the number of participants or users, while data from millions of users is usually obtained for a very short period of time (months or years). One of the good things about growing old in those social networks is that we are starting to see tens of years of data to analyze.

Here I want to analyze by Twitter activity during those last 10 years. First thing is to download all our account activity, something that is explain in the How to download and view your Twitter archive help page at Twitter. Basically:

  • Connect to your Account Settings at Twitter.
  • On the left sidebar you will see a link to Your Twitter data.
  • At this step you will probably have to confirm your password, but on the bottom of the next page you have a link to Request data of your Twitter account.
  • When the data is ready to download you will receive an notification at your email with the a link to download it.
  • The data comes as a series of JSON files. The file tweets.js contains all tweets, retweets and metions, but it comes with a window.YTD.tweet.part0 = header at the beggining. Remove it to make it a readable JSON file.

Let’s load the tweets

tweets <- jsonlite("tweets.js")

The table contains many fields, including the tweet id (id), timestamp when it was created created_at, if it is a reply to a status in_reply_to_status_id or a user in_reply_to_user_id.

More or less active?

The first thing we can investigate is if my behavior in Twitter has changed in these 10 years. My feeling is that people spend less time in the platform when we get older. One reason is that, compared to 2009, it is really difficult to keep tract of what is happening in the platform. I also have less and less time. But it is true that twitter has changed their app to engage users more with the converstation, so that might counterbalance it.

To analyze it, let’s add the formated timestamp to the dataset

tweets$timestamp <- as.POSIXct(tweets$created_at,format="%a %b %d %H:%M:%S %z %Y")

and plot the number of tweets by month.

require(ggplot2)
require(zoo)
ggplot(tweets,aes(x=as.yearmon(timestamp))) + geom_bar(binwidth=.09)+scale_x_yearmon() + theme_bw() + ylab("Number of tweets per month") + xlab("Time")

As we can see, the most active years were from 2011 to 2015 (around 100 tweets per month). From then on I am tweeting less, corroborating my feeling that I spent less time in the platform (at least tweeting 🙂 )

Tweeting or retweeting more?

Have I changed the way I use Twitter? Our research in Twitter sessions and social networks using mobile phone data shows that because of our limited atention and cognitive capacities people tend to perform simpler tasks with time and age. For example we found that in long sessions in Twitter (two or more hours), users start composing less messages (which require more effort) and use more retweets or mentions (replies) within the session, that require less effort.

Let’s see what happened in ten years of data. We classify tweets as composed, mention or retweets using the fields in the dataset.

tweets$class <- "normal"
tweets$class[!is.na(tweets$in_reply_to_status_id)] <- "mention"
tweets$class[!is.na(tweets$retweeted_status_id)] <- "RT"
tweets$class[grep("RT @",tweets$full_text)] <- "RT"

and show the fraction of tweets per month in each class

require(ggplot2)
require(zoo)
ggplot(tweets,aes(x=as.yearmon(timestamp), fill=class)) + geom_bar(position="fill",binwidth=.09)+scale_x_yearmon() + theme_bw() + ylab("Fraction of tweets of each class") + xlab("Time")

Similar to our research for Twitter sessions, I can see that I compose less original tweets with time: while in 2010 almost 50% of my tweets were composed, now only 20% are original and more than 50% of the tweets in my account are retweets.

Tweeting about what?

Finally, let’s see what I tweeted about. Although we could probably do much ellaborated analysis, a simple wordcloud will do here. We clean up the text of the tweets (including mentions and retweets) and produce a wordcloud

require(tm)
require(wordcloud)
texts <- tweets$full_text
#cleanup remove mentions and url
texts <- tolower(texts)
texts <- gsub('\\b+rt', '', texts)
texts <- gsub("@\\S+", "", texts)
texts <- gsub('http\\S+\\s*', '', texts)
texts <- gsub('[[:punct:]]', '', texts) 
texts <- removeWords(texts, stopwords("english")) #get rid of stopwords in english
texts <- removeWords(texts, stopwords("spanish")) #get rid of stopwords in english
corpus.texts.all <- Corpus(VectorSource(texts))
wordcloud(corpus.texts.all, 
          max.words=100, random.order= FALSE, colors=brewer.pal(8,"Dark2"),scale = c(3,.5))

As you can see, the word I used more is thanks (in spanish), together with other related to my field of research (networks, data, social, etc.).

Conclusion

So, yes I am growing older in Twitter and the way I use the platform is different. I probably spend more time reading that engaging in new conversations, new tweets or creating hashtags for events or conferences. This exercise proves that not only Twitter is a good platform to understand timely events like elections, sports, natural disasters or unemployment, but also to understand how people change in the course of a lifetime (or at least tens of years) with and outside the platform.

And as my wordcloud shows, I am thankfull for those 10 years of sharing the platform with friends, colleagues and other familiar strangers!

To leave a comment for the author, please follow the link and comment on their blog: R on Implicit None.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)