There’s a great Tom Waits song from the album “Mule Variations” called “Big in Japan”. The beauty of saying you’re big in Japan is that no one can ever really verify the statement (or at least that was more true in 1999). You might assert “my work is big on twitter”, and hey, how would I know? I think we’re all agreed now that if you’re a scientist being big on twitter is important. What about how much exposure your work gets on twitter though? In the research world, people are working on lot’s of interesting ways of measuring the impact of an article. People like Heather Piwowar, who cofounded total impact, are working to change how we measure the importance of a paper. More and more people want to look at the impact beyond just how many times other researchers cite it. That’s where projects like altmetrics and article level metrics from Plos come in. These are all great tools and I don’t doubt the future of how we measure impact. But what if you want to look under the hood of twitter and see what’s going on with a given research article? There’s lot’s web based tools (like tweetreach), but none of them offer a concise way to extract and store twitter data about the impact of scientific articles. Enter impacTwit (a slightly tongue-in-cheek name).
impacTwit is a collection of R functions that will output data about who tweeted and retweeted about any collection of search terms in a data frame that you can make easily plot. It gives you the time stamp, originating tweeter, and follower count of each tweet about a vector of search terms. It will sort them all by date and give you cumulative sums for the entire set of search terms, cumulative sums by originating tweeter, and cumulative sums by search term. It let’s you easily dissect the people who are influential about a paper, or the sources, and gives you a sense of the total impact on twitter. Total impact here is defined as the number of potential viewers of a tweet. Before I give a worked example I’ll say two caveats about “total impact”. Yes, just because a tweet is retweeted to 10,000 people doesn’t mean they all see it, and even if they see it, how many actually click on the article link to go read it? It is an imperfect metric to be sure.
The idea behind impacTwit is to measure the impact on twitter of a given scientific article (it could be used for blog posts too). The code works by searching twitter using the twitter API for article specific terms or links. It’s not really designed to handle huge amounts of data, so if you do this with a search for “Justin Bieber” I’m pretty sure you’ll break the code. The input is a vector of search strings, presumably about the same arcticle. As an example I’ll use a paper that’s been making the rounds on twitter from PNAS called “Heavy use of equations impedes communication among biologists”. Now we can search the actual URL of the article, but people might have linked to it from other sources, so we might want to include the actual title of paper as a search string on top of just the url. We search for the url because one tweet might be: “Love this article I hate math too : http://www.pnas.org/content/early/2012/06/22/1205259109.abstract”, or another might say:”Heavy use of equations impedes communication among biologists: http://some.obscure.source”. Finally there was an AP story about this called “Scientists think math is hard too”, so let’s include that title in our search. The actual input would is here:
test.str <- c("Scientists think math is hard too","http://www.pnas.org/content/early/2012/06/22/1205259109.abstract","Heavy use of equations impedes communication among biologists")
tweet.dat <- impacTwit(test.str)
We can then generate a series of plots for the resulting data frame, the first one is just a cumulative sum of the total impact.