# More Thoughts on Potential Audience Metrics for Hashtag Communities

February 10, 2012
By

(This article was first published on OUseful.Info, the blog... » Rstats, and kindly contributed to R-bloggers)

Following on from the sketched ideas relating to estimating the Potential Audience Size for a Hashtag Community?, here are a few quick doodles around the graph representation of the tag users – followers graph that explore the extent to which we can use quite simple counts and analyses to get a feel for how the followers of a set of hashtag users are distributed and the number of times they are likely to see a hashtagged tweets (I’m mulling over calling this potential view count “receipts”…)

require(igraph)

#Read in the graph: the graphs contain nodes representing Twitter users connected by directed weighted edges that represent 'is followed by' relations. The weights correspond to the number of hashtagged messages published by the from-node over the sample period

summary(g2)
#The summary provides an overview of the graph, The number of nodes corresponds to the number of folk in the union of the set of hashtaggers and their followers, for example.

#We can count how many nodes have a particular in-degree count (where' in-degree represents the number of hashtaggers the node follows)
g.nodes=as.data.frame(table(degree(g2,mode='in')))
g.nodes$Var1=as.numeric(levels(g.nodes$Var1)[as.integer(g.nodes$Var1)]) #Check: if we sum the node occurrence frequencies, we should get the total number of nodes as a result sum(g.nodes$Freq)

#We can then chart the result to look at the distribution of how many hashtaggers are followed by how many people
require(ggplot2)
ggplot(g.nodes)+geom_linerange(aes(x=Var1,ymin=0,ymax=Freq)) + scale_y_log10() + xlab('In-degree of followers')

To start with, we can get a view of how the indegree values of the follower nodes are distributed – this gives us an idea of how many of the hashtag users members of the follower set actually follow.

For a tight knit, coherent community, where tag users know each other, we might expect that folk who are likely to be interested in the tag are following several over the tag users.

Note the use of a log10 scale for the count… Most followers are following one tag user (most likely a single user of the tag with a large follower count). Folk following none of the tag users are likely to be tag users who don’t follow any of the other tag users captured during the sample period (erm, maybe? They could also be tag users with private settings, so their friend/follower lists aren’t public…)

Here’s the code for a second sketch…

#The incoming edges to follower nodes are weighted according to the number of tagged tweets the corresponding hashtagger published in the sample period.
#What this means is that we can count the total number of tagged tweets seen by each follower by summing the weights of edges incident on each node
g.weights=as.data.frame(table(graph.strength(g2,mode='in')))
g.weights$Var1=as.numeric(levels(g.weights$Var1)[as.integer(g.weights$Var1)]) #If we sum the product of message counts and frequencies, we see how many potential "receipts" of a tagged tweet there were. sum(g.weights$Var1*g.weights$Freq) #We can also plot the distribution of the number of tagged tweets potentially received by each follower ggplot(g.weights)+geom_linerange(aes(x=Var1,ymin=0,ymax=Freq)) + scale_y_log10() + xlab('Incoming tagged message count') This time, we get to see the distribution of the number of receipts of a tagged message across the follower set, where a receipt represents a publication of a tagged tweet in the sample period from any one of the tag users followed by an individual. Because the graph uses edges weighted according to the number of tagged tweets published by a user, we can easily calculate the number of tagged tweets potentially seen by a user by summing the weights of their incoming edges from tag users. This chart makes it clear that most folk in the potential hashtag audience only had one potential receipt of a tagged tweet… Which makes me start thinking about ways of considering “conversion” rates based in part on the likelihood of follower to join in a hashtag community given the number of tag users to date they follow and the number of followers each of those tag users has… Note that range of the incoming message count is greater than the range of the number of tag users followed because some tag users tweet using the tag more than once during the sample period. Finally, we chart as a histogram the distribution of the number of followers of each tag user, simply because we can easily do so… #It's also easy enough to chart the distribution of the follower counts for each hashtagger: tagger.nodes=subset(as.data.frame(table(degree(g2,mode='out'))),subset=(Var1!='0')) tagger.nodes$Var1=as.numeric(levels(tagger.nodes$Var1)[as.integer(tagger.nodes$Var1)])
#Quick check on the number of taggers
sum(tagger.nodes\$Freq)

#And the distribution of how many followers they have
ggplot(tagger.nodes)+geom_histogram(aes(x=Var1,ymin=0,ymax=Freq),binwidth=250)  + xlab('Follower count')

Note the outliers…

For additional charts that can be generated from the graph representation, see: Experimenting With iGraph – and a Hint Towards Ways of Measuring Engagement?

PS Hmmm…pondering this.. focus of a tag user is the number of their followers who originate a tagged tweet in the sample period (RTs don’t count, and maybe neither do replies…) divided by the total number of their follwers…? And maybe salience as the number of tagged tweets published by an individual during the sample period divided by the total number of tweets they published over the same period…?

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...