The echo of a tragedy in Social Media – “The Making of”

January 14, 2015
By

(This article was first published on Big Data Doctor » R, and kindly contributed to R-bloggers)

In this post I’m going to describe how I pull together the visualization of The echo of a tragedy in Social Media.

I used 3 different technologies to pull it together: PHP for data gathering, R for data analysis and data output generation and D3.js for data visualization.

The day after the incidents in the Charlie Hebdo took place, I set up a Twitter monitor using the Search API based on the hashtag #jesuischarlie for 5 different languages: German, Spanish, English, Italian and of course French. I made sure for each language I had at least 10K Tweets gathered. I made available the data as csv (jesuischarlie.csv). The PHP part is easy so I’m not going to focus on it :)

The analysis part is no rocket science and is accomplished in following steps:

  • Counting the occurrences of all available hashtags per language
  • Taking the top 50 hashtags per language
  • Computing the occurrences of hashtags pairs within the selected hashtags
  • Generating the JSON according to the directed forced D3.js layout

The code taking care of the visualization is written in D3.js, embedded and readable in the visualization page.
A few comments on that:

  • There are 2 functions to make both text and opacity dependent on the number of occurrences
  • The nodes accept a click to highlight the connections to other nodes
  • The color of the highlighted connections change according to a round robin approach to allow for multiple highlighting
  • The values for gravity, distance and charge have been manually adjusted… worth spending some time playing around with them
  • The size of the images for the nodes is determined by a function to allow bigger sizes for the flags

Below you can find the R code… Again, it might not be 100% optimized, but does the job… Give it a go!

library(RMySQL)
require(data.table)
library(stringr)
 
con <- dbConnect(MySQL(), user="XXXXXX", password="XXXXXX",dbname="XXXXXXX", host="XXXXXXXX")
res=dbSendQuery(conn=con,"SELECT * FROM domaintweets where tag='#JeSuisCharlie'")
assets<-fetch(res,-1)
dbDisconnect(con)
# I've done it with a data based... you can do it with the data set I provided above with a read.csv statement
# assets<-read.csv("~/jesuischarlie/jesuischarlie.csv")
 
# languages we extracted tweets for
languages<-c("es","de","en","fr","it")
 
countPerLanguage<-function (assets, max)
{
  assets$ntags<-str_count(assets$tags, ",")
  #filtering out tweets with way too many hashtags
  assets<-assets[assets$ntags<5,]
  assets<-head(assets,max)
  tags<-assets$tags
 
  alltags<-paste(tags, collapse = ',')
  alltags<-gsub(x=alltags, pattern = ',,',replacement = ',')
  all.tags<-strsplit(x = alltags,split = ',')
  df.all.tags<- as.data.frame(all.tags)
  colnames(df.all.tags)<-c("tag")
  df.all.tags$tag<-as.character(df.all.tags$tag)
  df.all.tags$tag<-tolower(df.all.tags$tag)
  df.all.tags$count<-1
  agg.df.all.tags<-aggregate(count~tag, df.all.tags, length)
  agg.df.all.tags<-agg.df.all.tags[order(agg.df.all.tags$count, decreasing = T),]
  return (agg.df.all.tags)
}
 
# taking the top50 hashtags per languages
agg.languages<-NULL
for (i in 1:length(languages))
{
  df.lang<-subset(assets, lang==languages[i])
  top50 <- countPerLanguage(df.lang, 10000)
  lang<-rep(x = languages[i],times = nrow(top50))
  top50$lang<-lang
  top50<-head(top50,50)
  agg.languages<-rbind(agg.languages,top50)
}
 
# hashtag 2 hashtag connection
unique.tags<-unique(agg.languages$tag)
pairs.df<-NULL
for (i in 2:length(unique.tags))
{
  tag1 <- unique.tags[i]
  assets.tag <- subset(assets, grepl(pattern = paste0("^",tag1,",", "|",",", tag1,",") ,x = assets$tags) )
 
  for (j in i:length(unique.tags))
  {
    if (i!=j)
    {
      tag2<-unique.tags[j]
      print(paste(tag1, "-", tag2))
      pairs <- subset(assets.tag, grepl(pattern = paste0("^",tag2,",", "|",",", tag2,",") ,x = assets.tag$tags) )
      count.pairs<-nrow(pairs)
      if (count.pairs!= 0) {
        l<-list(unique.tags[i],unique.tags[j],count.pairs)
        df<-as.data.frame(l)
        colnames(df)<-c('tag1','tag2','Freq')
        pairs.df<-rbind(pairs.df,df)       
      }
    }
  }
}
pairs.df$tag1<-as.character(pairs.df$tag1)
pairs.df$tag2<-as.character(pairs.df$tag2)
 
# Generating JSON compatible with the force-directed graph http://bl.ocks.org/mbostock/4062045
 
languages<-c("de","en","it","fr","es")
languages.img<-c("https://cdn3.iconfinder.com/data/icons/finalflags/32/Germany-Flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/United-Kingdom-flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/Italy-Flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/France-Flag.png",
                 "https://cdn3.iconfinder.com/data/icons/finalflags/32/Spain-Flag.png")
 
img.url<-"https://cdn4.iconfinder.com/data/icons/miu/22/editor_pencil_pen_edit_write_-16.png"
listNodes<-NULL
nodeNr<-0
# Genereting the Nodes
strJson<-'{ "nodes":['
 
for(j in 1:length(languages))
{
  lang<-agg.languages[agg.languages$lang==languages[j],]
  df<-data.frame(tag=languages[j],noderNr=nodeNr)
  listNodes<-rbind(df,listNodes)
  # entry for the language
  dt<-paste0('{"name":"', languages[j],'","group":',j,', "image":"',languages.img[j]  ,'", "count":', sum(lang$count) ,', "number":', nodeNr ,' },' )
  nodeNr<-nodeNr+1
 
  strJson<-paste0(strJson,dt)
  for(i in 1:nrow(lang))
  {   
    if (nrow(listNodes[listNodes$tag==lang[i,]$tag,])==0 &&  !(lang[i,]$tag %in% languages))
    {
      dt<-paste0('{"name":"', lang[i,]$tag,'","group":',j,', "image":"',img.url  ,'", "count":', lang[i,]$count ,', "number":', nodeNr ,' },' )
      strJson<-paste0(strJson,dt)
      df<-data.frame(tag=lang[i,]$tag,noderNr=nodeNr)
      listNodes<-rbind(df,listNodes)
      nodeNr<-nodeNr+1
    }
 
  }
}
strJson<-substr(strJson,1,nchar(strJson)-1)
 
# Genereting the Links
strJson<-paste0(strJson,'], "links": [')
for (l in 1:nrow(agg.languages))
{
  matches1<-subset(listNodes, tag==agg.languages[l,]$lang)
  matches2<-subset(listNodes, tag==agg.languages[l,]$tag)
  for (k in 1:nrow(matches1))
  {
    for (m in 1:nrow(matches2))
    {
      dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},')
      strJson<-paste0(strJson,dt) 
    }
  }
}
for (l in 1:nrow(pairs.df))
{
  matches1<-subset(listNodes, tag==pairs.df[l,]$tag1)
  matches2<-subset(listNodes, tag==pairs.df[l,]$tag2)
  for (k in 1:nrow(matches1))
  {
    for (m in 1:nrow(matches2))
    {
      dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},')
      strJson<-paste0(strJson,dt) 
    }
  }
}
 
strJson<-substr(strJson,1,nchar(strJson)-1)
strJson<-paste0(strJson,']}')
# Writing it to a file
write(x = strJson, file = '~/jesuischarlie/jesuischarlie2.json')

To leave a comment for the author, please follow the link and comment on their blog: Big Data Doctor » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)