Estimated Follower Accession Charts for Twitter

April 5, 2013
By

(This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers)

Just over a year or so ago, Mat Morrison/@mediaczar introduced me to a visualisation he’d been working on (How should Page Admins deal with Flame Wars?) that I started to refer to as an accession chart (Visualising Activity Around a Twitter Hashtag or Search Term Using R). The idea is that we provide each entrant into a conversation or group with an accession number: the first person has accession number 1, the second person accession number 2 and so on. The accession number is plotted in rank order on the vertical y-axis, with ranked/time ordered “events” along the horizontal x-axis: utterances in a conversation for example, or posts to a forum.

A couple of months ago, I wondered whether this approach might also be used to estimate when folk started following an individual on Twitter. My reasoning went something like this:

One of the things I think is true of the Twitter API call for the followers of an account is that it returns lists of followers in reverse accession order. So the person who followed an account most recently will be at the top of the list (the first to be returned) and the person who followed first will be at the end of the list. Unfortunately, we don’t know when followers joined, so it’s hard to spot bursty growth in the number of followers of an account. However, it struck me that we may be able to get a bound on this by looking at the dates at which followers joined Twitter, along with their ‘accession order’ as followers of an account. If we get the list of followers and reverse it, and assume that this gives an ordered list of followers (with the follower that started following the longest time ago first), we can then work through this list and keep track of the oldest ‘created_at’ date seen so far. This gives us an upper bound (most recent date) for when followers that far through the list started following. (You can’t start following until you join twitter…)

So for example, if followers A, B, C, D in that accession order (ie started following target in that order) have user account creation dates 31/12/09, 1/1/09, 15/6/12, 5/5/10 then:
- A started following no earlier than 31/12/09 (because that’s when they joined Twitter and it’s the most recent creation date we’ve seen so far)
- B started following no earlier than 31/12/09 (because they started following after B)
- C started following no earlier than 15/6/12 (because that’s when they joined Twitter and it’s the most recent creation date we’ve seen so far)
- D started following no earlier than 15/6/12 (because they started following after C, which gave use the most recent creation date seen so far)

That’s probably confused you enough, so here’s a chart – accession number is along the bottom (i.e. the x-axis), joining date (in days ago) is on the y-axis:

recencyVacc

NOTE: this diverges from the accession graph described above, where accession number goes on the y-axis and rank ordered event along the x-axis.

What the chart shows is an estimate (the red line) of how many days ago a follower with a particular accession number started to follow a particular Twitter account.

As described in Sketches Around Twitter Followers, we see a clear break at 1500 days ago when Twitter started to get popular. This approach also suggests a technique for creating “follower probes” that we can use to date a follower record: if you know which day a particular user followed a target account, you can use that follower to put a datestamp into the follower record (assuming the Twitter API returned followers in reverse accession order).

Here’s an example of the code I used based on Twitter follower data grabbed for @ChrisPincher (whose follower profile appeared to be out of sorts from the analysis sketched in Visualising Activity Around a Twitter Hashtag or Search Term Using R). I’ve corrected the x/y axis ordering so follower accession number is now the vertical, y-component.

require(ggplot2)

processUserData = function(data) {
    data$tz = as.POSIXct(data$created_at)
    data$days = as.integer(difftime(Sys.time(), data$tz, units = "days"))
    data = data[rev(rownames(data)), ]
    data$acc = 1:length(data$days)
    data$recency = cummin(data$days)

    data
}

mp_cp <- read.csv("~/code/MPs/ChrisPincher_fo_0__2013-02-16-01-29-28.csv", row.names = NULL)

ggplot(processUserData(mp_cp)) +  geom_point(aes(x = -days, y = acc), size = 0.4) + geom_point(aes(x = -recency, y = acc), col = "red", size = 1)+xlim(-2000,0)

Here’s @ChrisPincher’s chart:

cp_demo

The black dots reveal how many days ago a particular follower joined Twitter. The red line is the estimate of when a particular follower started following the account, estimated based on the most recently created account seen to date amongst the previously acceded followers.

We see steady growth in follower numbers to start with, and then the account appears to have been spam followed? (Can you spot when?!;-) The clumping of creation dates of accounts during the attack also suggests they were created programmatically.

[In the "next" in this series of posts [What Happened Then? Using Approximated Twitter Follower Accession to Identify Political Events], I’ll show how spikes in follower acquisition on a particular day can often be used to “detect” historical news events.]

PS after coming up with this recipe, I did a little bit of “scholarly research” and I learned that a similar approach for estimating Twitter follower acquisition times had already been described at least once, at the opening of this paper: We Know Who You Followed Last Summer: Inferring Social Link Creation Times In Twitter – “We estimate the edge creation time for any follower of a celebrity by positing that it is equal to the greatest lower bound that can be deduced from the edge orderings and follower creation times for that celebrity”.


To leave a comment for the author, please follow the link and comment on his blog: OUseful.Info, the blog... » Rstats.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.