Site icon R-bloggers

The echo of a tragedy in Social Media – “The Making of”

[This article was first published on Big Data Doctor » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I’m going to describe how I pull together the visualization of The echo of a tragedy in Social Media.

I used 3 different technologies to pull it together: PHP for data gathering, R for data analysis and data output generation and D3.js for data visualization.

The day after the incidents in the Charlie Hebdo took place, I set up a Twitter monitor using the Search API based on the hashtag #jesuischarlie for 5 different languages: German, Spanish, English, Italian and of course French. I made sure for each language I had at least 10K Tweets gathered. I made available the data as csv (jesuischarlie.csv). The PHP part is easy so I’m not going to focus on it

The analysis part is no rocket science and is accomplished in following steps:

  • Counting the occurrences of all available hashtags per language
  • Taking the top 50 hashtags per language
  • Computing the occurrences of hashtags pairs within the selected hashtags
  • Generating the JSON according to the directed forced D3.js layout

The code taking care of the visualization is written in D3.js, embedded and readable in the visualization page.
A few comments on that:

  • There are 2 functions to make both text and opacity dependent on the number of occurrences
  • The nodes accept a click to highlight the connections to other nodes
  • The color of the highlighted connections change according to a round robin approach to allow for multiple highlighting
  • The values for gravity, distance and charge have been manually adjusted… worth spending some time playing around with them
  • The size of the images for the nodes is determined by a function to allow bigger sizes for the flags

Below you can find the R code… Again, it might not be 100% optimized, but does the job… Give it a go!

library(RMySQL)
require(data.table)
library(stringr)
 
con <- dbConnect(MySQL(), user="XXXXXX", password="XXXXXX",dbname="XXXXXXX", host="XXXXXXXX")
res=dbSendQuery(conn=con,"SELECT * FROM domaintweets where tag='#JeSuisCharlie'")
assets<-fetch(res,-1)
dbDisconnect(con)
# I've done it with a data based... you can do it with the data set I provided above with a read.csv statement
# assets<-read.csv("~/jesuischarlie/jesuischarlie.csv")
 
# languages we extracted tweets for
languages<-c("es","de","en","fr","it")
 
countPerLanguage<-function (assets, max)
{
  assets$ntags<-str_count(assets$tags, ",")
  #filtering out tweets with way too many hashtags
  assets<-assets[assets$ntags<5,]
  assets<-head(assets,max)
  tags<-assets$tags
 
  alltags<-paste(tags, collapse = ',')
  alltags<-gsub(x=alltags, pattern = ',,',replacement = ',')
  all.tags<-strsplit(x = alltags,split = ',')
  df.all.tags<- as.data.frame(all.tags)
  colnames(df.all.tags)<-c("tag")
  df.all.tags$tag<-as.character(df.all.tags$tag)
  df.all.tags$tag<-tolower(df.all.tags$tag)
  df.all.tags$count<-1
  agg.df.all.tags<-aggregate(count~tag, df.all.tags, length)
  agg.df.all.tags<-agg.df.all.tags[order(agg.df.all.tags$count, decreasing = T),]
  return (agg.df.all.tags)
}
 
# taking the top50 hashtags per languages
agg.languages<-NULL
for (i in 1:length(languages))
{
  df.lang<-subset(assets, lang==languages[i])
  top50 <- countPerLanguage(df.lang, 10000)
  lang<-rep(x = languages[i],times = nrow(top50))
  top50$lang<-lang
  top50<-head(top50,50)
  agg.languages<-rbind(agg.languages,top50)
}
 
# hashtag 2 hashtag connection
unique.tags<-unique(agg.languages$tag)
pairs.df<-NULL
for (i in 2:length(unique.tags))
{
  tag1 <- unique.tags[i]
  assets.tag <- subset(assets, grepl(pattern = paste0("^",tag1,",", "|",",", tag1,",") ,x = assets$tags) )
 
  for (j in i:length(unique.tags))
  {
    if (i!=j)
    {
      tag2<-unique.tags[j]
      print(paste(tag1, "-", tag2))
      pairs <- subset(assets.tag, grepl(pattern = paste0("^",tag2,",", "|",",", tag2,",") ,x = assets.tag$tags) )
      count.pairs<-nrow(pairs)
      if (count.pairs!= 0) {
        l<-list(unique.tags[i],unique.tags[j],count.pairs)
        df<-as.data.frame(l)
        colnames(df)<-c('tag1','tag2','Freq')
        pairs.df<-rbind(pairs.df,df)       
      }
    }
  }
}
pairs.df$tag1<-as.character(pairs.df$tag1)
pairs.df$tag2<-as.character(pairs.df$tag2)
 
# Generating JSON compatible with the force-directed graph http://bl.ocks.org/mbostock/4062045
 
languages<-c("de","en","it","fr","es")
languages.img<-c("https://cdn3.iconfinder.com/data/icons/finalflags/32/Germany-Flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/United-Kingdom-flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/Italy-Flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/France-Flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/Spain-Flag.png")
 
img.url<-"https://cdn4.iconfinder.com/data/icons/miu/22/editor_pencil_pen_edit_write_-16.png"
listNodes<-NULL
nodeNr<-0
# Genereting the Nodes
strJson<-'{ "nodes":['
 
for(j in 1:length(languages))
{
  lang<-agg.languages[agg.languages$lang==languages[j],]
  df<-data.frame(tag=languages[j],noderNr=nodeNr)
  listNodes<-rbind(df,listNodes)
  # entry for the language
  dt<-paste0('{"name":"', languages[j],'","group":',j,', "image":"',languages.img[j]  ,'", "count":', sum(lang$count) ,', "number":', nodeNr ,' },' )
  nodeNr<-nodeNr+1
 
  strJson<-paste0(strJson,dt)
  for(i in 1:nrow(lang))
  {   
    if (nrow(listNodes[listNodes$tag==lang[i,]$tag,])==0 &&  !(lang[i,]$tag %in% languages))
    {
      dt<-paste0('{"name":"', lang[i,]$tag,'","group":',j,', "image":"',img.url  ,'", "count":', lang[i,]$count ,', "number":', nodeNr ,' },' )
      strJson<-paste0(strJson,dt)
      df<-data.frame(tag=lang[i,]$tag,noderNr=nodeNr)
      listNodes<-rbind(df,listNodes)
      nodeNr<-nodeNr+1
    }
 
  }
}
strJson<-substr(strJson,1,nchar(strJson)-1)
 
# Genereting the Links
strJson<-paste0(strJson,'], "links": [')
for (l in 1:nrow(agg.languages))
{
  matches1<-subset(listNodes, tag==agg.languages[l,]$lang)
  matches2<-subset(listNodes, tag==agg.languages[l,]$tag)
  for (k in 1:nrow(matches1))
  {
    for (m in 1:nrow(matches2))
    {
      dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},')
      strJson<-paste0(strJson,dt) 
    }
  }
}
for (l in 1:nrow(pairs.df))
{
  matches1<-subset(listNodes, tag==pairs.df[l,]$tag1)
  matches2<-subset(listNodes, tag==pairs.df[l,]$tag2)
  for (k in 1:nrow(matches1))
  {
    for (m in 1:nrow(matches2))
    {
      dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},')
      strJson<-paste0(strJson,dt) 
    }
  }
}
 
strJson<-substr(strJson,1,nchar(strJson)-1)
strJson<-paste0(strJson,']}')
# Writing it to a file
write(x = strJson, file = '~/jesuischarlie/jesuischarlie2.json')

To leave a comment for the author, please follow the link and comment on their blog: Big Data Doctor » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.