When Trump visits… tweets from his trip to Mexico

[This article was first published on En El Margen - R-English, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’m sure many of my fellow Mexicans will remember the historically ill-advised (to say the least) decision by our President to invite Donald Trump for a meeting.

Talking to some fellow colleagues, we couldn’t help but notice that maybe in another era this decision would have been good policy. The problem, some concluded, was the influence of social media today. In fact, the Trump debacle did cause outcry among leading politica voices online.

I wanted to investigate this further, and thankfully for me, I’ve been using R to collect tweets from a catalog of leading political personalities in Mexico for a personal business project.

Here is a short descriptive look at what the 65 twitter accounts I’m following tweeted between August 27th and September 5th (the Donald announced his visit on August the 30th). I’m sorry I can’t share the dataset, but you get the idea with the code…

library(dplyr)
library(stringr)

# 42 of the 65 accounts tweeted between those dates.
d %>% 
  summarise("n" = n_distinct(NOMBRE))
#   n
#  42

We can see how mentions of trump spike just about the time it was announced…

byhour <- d %>% 
  mutate("MONTH" = as.numeric(month(T_CREATED)), 
         "DAY" = as.numeric(day(T_CREATED)), 
         "HOUR" = as.numeric(hour(T_CREATED)), 
         "TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% 
  group_by(MONTH, DAY, HOUR) %>% 
  summarise("N" = n(), 
            "TRUMP_MENTIONS" = sum(TRUMP_MENTION)) %>%
  mutate("PCT_MENTIONS" = TRUMP_MENTIONS/N*100) %>%
  arrange(desc(MONTH), desc(DAY), HOUR) %>%
  mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00")))

library(ggplot2)  
library(eem)
ggplot(byhour, 
       aes(x = CHART_DATE, 
           y = PCT_MENTIONS)) + 
        geom_line(colour=eem_colors[1]) + 
        theme_eem()+
        labs(x = "Time", 
             y = "Trump mentions \n (% of Tweets)")

Trump tweets by mexican officials, percent

The peak of mentions (as a percentage of tweets) was September 1st at 6 am (75%). But it terms of amount of tweets, it is much more obvious the outcry was following the anouncement and later visit of the candidate:

ggplot(byhour, 
       aes(x = CHART_DATE, 
           y = TRUMP_MENTIONS)) + 
        geom_line(colour=eem_colors[1]) + 
        theme_eem()+
        labs(x = "Time", 
             y = "Trump mentions \n (# of Tweets)")

Trump tweets by mexican officials, total

We can also (sort-of) identify the effect of these influencers tweeting. I’m going to add the followers, which are potential viewers, of each tweet mentioning Trump, by hour.

byaudience <- d %>% 
  mutate("MONTH" = as.numeric(month(T_CREATED)), 
         "DAY" = as.numeric(day(T_CREATED)), 
         "HOUR" = as.numeric(hour(T_CREATED)), 
         "TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>% 
  filter(TRUMP_MENTION > 0) %>%
  group_by(MONTH, DAY, HOUR) %>% 
  summarise("TWEETS" = n(), 
            "AUDIENCE" = sum(U_FOLLOWERS)) %>%
  arrange(desc(MONTH), desc(DAY), HOUR) %>%
  mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00")))


ggplot(byaudience, 
       aes(x = CHART_DATE, 
           y = AUDIENCE)) + 
        geom_line(colour=eem_colors[1]) + 
        theme_eem()+
        labs(x = "Time", 
             y = "Potential audience \n (# of followers)")

Total audience of trump tweets

So clearly, I’m stating the obvious. People were talking. But how was the conversation being developed? Let’s first see the type of tweets (RT’s vs drafted individually):

bytype <- d %>% 
  mutate("TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>%
  # only the tweets that mention trump
  filter(TRUMP_MENTION>0) %>%
  group_by(T_ISRT) %>% 
  summarise("count" = n())
kable(bytype)
T_ISRT count
FALSE 313
TRUE 164

About 1 in 3 was a RT. Comparing to the overall tweets, (1389 out of 3833) this seems not too much of a difference, so it wasn’t necesarrily an influencer pushing the discourse. In terms of the most mentioned by tweet it was our President on the spotlight:

bymentionchain <- d %>% 
  mutate("TRUMP_MENTION" = str_count(TXT, pattern = "Trump|TRUMP|trump")) %>%
  # only the tweets that mention trump
  group_by(TRUMP_MENTION, MENTION_CHAIN) %>% 
  summarise("count" = n()) %>% 
  ungroup() %>% 
  mutate("GROUPED_CHAIN" = ifelse(grepl(pattern = "EPN", 
                                        x = MENTION_CHAIN), 
                                  "EPN", MENTION_CHAIN)) %>% 
  mutate("GROUPED_CHAIN" = ifelse(grepl(pattern = "realDonaldTrump", 
                                        x = MENTION_CHAIN), 
                                  "realDonaldTrump", GROUPED_CHAIN))
                                  
ggplot(order_axis(bymentionchain %>% 
                    filter(count>10 & GROUPED_CHAIN!="ND"), 
                  axis = GROUPED_CHAIN, 
                  column = count), 
       aes(x = GROUPED_CHAIN_o, 
           y = count)) + 
  geom_bar(stat = "identity") + 
  theme_eem() + 
  labs(x = "Mention chain \n (separated by _|.|_ )", y = "Tweets")

Mentions

How about the actual persons who tweeted? It seemed like news anchor Joaquin Lopez-Doriga and security analyst Alejandro Hope were the most vocal about the visit (out of the influencers i’m following).

bytweetstar <- d %>% 
  mutate("TRUMP_MENTION" = ifelse(str_count(TXT, pattern = "Trump|TRUMP|trump")<1,0,1)) %>%
  group_by(TRUMP_MENTION, NOMBRE) %>% 
  summarise("count" = n_distinct(TXT))
## plot with ggplot2

Mentions

I also grouped each person by his political affiliation and I found it confirms the notion that the conversation on the eve of the visit, at least among this very small subset of twitter accounts, was driven by those with no party afiliation or in the “PAN” (opposition party).

byafiliation <- d %>% 
  mutate("MONTH" = as.numeric(month(T_CREATED)), 
         "DAY" = as.numeric(day(T_CREATED)), 
         "HOUR" = as.numeric(hour(T_CREATED)), 
         "TRUMP_MENTION" = ifelse(str_count(TXT, pattern = "Trump|TRUMP|trump")>0,1,0)) %>% 
  group_by(MONTH, DAY, HOUR, TRUMP_MENTION, AFILIACION) %>% 
  summarise("TWEETS" = n()) %>%
  arrange(desc(MONTH), desc(DAY), HOUR) %>%
  mutate("CHART_DATE" = as.POSIXct(paste0("2016-",MONTH,"-",DAY," ", HOUR, ":00")))
  
 ggplot(byafiliation, 
       aes(x = CHART_DATE, 
           y = TWEETS, 
           group = AFILIACION, 
           fill = AFILIACION)) + 
  geom_bar(stat = "identity") + 
  theme_eem() + 
  scale_fill_eem(20) + 
  facet_grid(TRUMP_MENTION ~.) +
  labs(x = "Time", y = "Tweets \n (By mention of Trump)")

Mentions

However, It’s interesting to note how there is a small spike of the accounts afiliated with the PRI (party in power) on the day after his visit (Sept. 1st). Maybe they were trying to drive the conversation to another place?

To leave a comment for the author, please follow the link and comment on their blog: En El Margen - R-English.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)