Twitter simultaneously allows people to be heard and to hear, easily and in real time, bringing with it potentially fascinating and ground breaking insights for anybody trying to take the public’s pulse on a hot issue. It is almost mandatory that so much benefit comes with its challenges, namely how to process the humongous amounts of data available every minute. Never one to shy away from some super number crunching, this post is the first one of a series on Twitter.
The chart above represents more than 200,000 mentions of the twitter account belonging to Mexico’s President Enrique Peña Nieto during the past 20 days. It is intended as a proxy of how popular the President of Mexico is on Twitter. Every tweet was analyzed and scored as either positive or negative. Then the total number of positive/negative tweets per hour was recorded and divided by the total number of tweets in that hour. The results show, that on average, Peña Nieto has had a positive sentiment score (0.54) but there are severe hourly negative spikes.
If you would like to know more about how the calculations were elaborated, continue reading
The 200,000 tweets in Spanish came from Twitters’s API harvested over two weeks
I got the data via Twitters API using the query [email protected] OR #EPN OR EPN, lang: es”. That is: give me all the Tweets mentioning the user EPN in Spanish. From Tuesday, February 25th to Sunday, March 16th there were 200,000 tweets pertaining to this query.
Obviously, there will be positive, negative and neutral tweets. How do we classify a tweet as positive or negative? This is not big data but it definitely is a lot of data, much more than what any one person could process in a short time. For example, the image below shows three of them. Can you classify them as positive or negative? Imagine reading the whole 91,000 tweets for the past 7 days!
Review each tweet, count the number of positive and negative words and subtract them to get the tweet´s sentiment score
In this first post I’ll be using a simple technique. It counts the number of occurrences of “positive” and “negative” words in a tweet. Then, the algorithm subtracts the negative words from positive ones to calculate the tweet’s sentiment score. This is what some people call a naive algorithm since it is not very good with sarcasm. In the third article of this series I’ll be exploring a more advanced algorithm similar to the one used to classify email as spam. If you would like to get updates on it, subscribe here.
Who tweets or retweets (RTs) also matters
It is not the same when a user tweets something to 300 followers as when former President Felipe Calderón does that with his 2.75 million followers. Let’s try to address this by calculating a simple influence score that multiplies each tweet’s sentiment score by the number of retweets it got. The chart below is a first attempt to capture this effect. In short, if a tweet with positive value got retweeted 500 times, it will have more value than one that got retweeted only once.
As you can see from the chart, there are some periods with striking variations in the influence score. One example of both positive and negative mentions that got shared many times (retweeted) is below. These mentions are quite relevant since RTs help increase the mean life of a tweet which is about 18 minutes. Therefore, although both tweets below were created days ago they continue to have a positive and negative impact on the account’s influence score.
Reconozco la labor de las instituciones de seguridad del Estado mexicano, para lograr la aprehensión de Joaquín Guzmán Loera en Mazatlán.
— Enrique Peña Nieto (@EPN) February 22, 2014
Peña Nieto @EPN se quedó calladito y no se atrevió a decir nada sobre violencia en Venezuela. Triste silencio de casi toda América Latina
— JORGE RAMOS (@jorgeramosnews) February 20, 2014
Twitter does not substitute public opinion polls but it is as good as it gets regarding real time information
With more than 11.7 million of Twitter users in Mexico (2012 estimate).Twitter’s value in interpreting the country’s public opinion is on the rise. But, we get it, pollsters spend a lot of time learning their trade, and it’s a painstaking process with many nuances. We are not trying to take their jobs, Twitter does not substitute public opinion polls and it can be biased, but it is as good as it gets when talking about real time information. Check out this Pew Research Center article about Twitter and public opinion in the U.S.
Twitter is as good as it gets for real time information but it is hard to mine and should be addressed with care
Although Twitter is an awesome source of information, providing short-term useful insights on how the public is behaving /reacting on Twitter towards certain trends, those insights should be taken with several pinches of salt and some caution.
Stay tuned for the next post on this series.
The post How popular is the President of Mexico on Twitter? appeared first on Jose Gonzalez.