The echo of a tragedy in Social Media – “The Making of”
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

In this post I’m going to describe how I pull together the visualization of The echo of a tragedy in Social Media.
I used 3 different technologies to pull it together: PHP for data gathering, R for data analysis and data output generation and D3.js for data visualization.
The day after the incidents in the Charlie Hebdo took place, I set up a Twitter monitor using the Search API based on the hashtag #jesuischarlie for 5 different languages: German, Spanish, English, Italian and of course French. I made sure for each language I had at least 10K Tweets gathered. I made available the data as csv (jesuischarlie.csv). The PHP part is easy so I’m not going to focus on it ![]()
The analysis part is no rocket science and is accomplished in following steps:
- Counting the occurrences of all available hashtags per language
- Taking the top 50 hashtags per language
- Computing the occurrences of hashtags pairs within the selected hashtags
- Generating the JSON according to the directed forced D3.js layout
The code taking care of the visualization is written in D3.js, embedded and readable in the visualization page.
A few comments on that:
- There are 2 functions to make both text and opacity dependent on the number of occurrences
- The nodes accept a click to highlight the connections to other nodes
- The color of the highlighted connections change according to a round robin approach to allow for multiple highlighting
- The values for gravity, distance and charge have been manually adjusted… worth spending some time playing around with them
- The size of the images for the nodes is determined by a function to allow bigger sizes for the flags
Below you can find the R code… Again, it might not be 100% optimized, but does the job… Give it a go!
library(RMySQL)
require(data.table)
library(stringr)
con <- dbConnect(MySQL(), user="XXXXXX", password="XXXXXX",dbname="XXXXXXX", host="XXXXXXXX")
res=dbSendQuery(conn=con,"SELECT * FROM domaintweets where tag='#JeSuisCharlie'")
assets<-fetch(res,-1)
dbDisconnect(con)
# I've done it with a data based... you can do it with the data set I provided above with a read.csv statement
# assets<-read.csv("~/jesuischarlie/jesuischarlie.csv")
# languages we extracted tweets for
languages<-c("es","de","en","fr","it")
countPerLanguage<-function (assets, max)
{
assets$ntags<-str_count(assets$tags, ",")
#filtering out tweets with way too many hashtags
assets<-assets[assets$ntags<5,]
assets<-head(assets,max)
tags<-assets$tags
alltags<-paste(tags, collapse = ',')
alltags<-gsub(x=alltags, pattern = ',,',replacement = ',')
all.tags<-strsplit(x = alltags,split = ',')
df.all.tags<- as.data.frame(all.tags)
colnames(df.all.tags)<-c("tag")
df.all.tags$tag<-as.character(df.all.tags$tag)
df.all.tags$tag<-tolower(df.all.tags$tag)
df.all.tags$count<-1
agg.df.all.tags<-aggregate(count~tag, df.all.tags, length)
agg.df.all.tags<-agg.df.all.tags[order(agg.df.all.tags$count, decreasing = T),]
return (agg.df.all.tags)
}
# taking the top50 hashtags per languages
agg.languages<-NULL
for (i in 1:length(languages))
{
df.lang<-subset(assets, lang==languages[i])
top50 <- countPerLanguage(df.lang, 10000)
lang<-rep(x = languages[i],times = nrow(top50))
top50$lang<-lang
top50<-head(top50,50)
agg.languages<-rbind(agg.languages,top50)
}
# hashtag 2 hashtag connection
unique.tags<-unique(agg.languages$tag)
pairs.df<-NULL
for (i in 2:length(unique.tags))
{
tag1 <- unique.tags[i]
assets.tag <- subset(assets, grepl(pattern = paste0("^",tag1,",", "|",",", tag1,",") ,x = assets$tags) )
for (j in i:length(unique.tags))
{
if (i!=j)
{
tag2<-unique.tags[j]
print(paste(tag1, "-", tag2))
pairs <- subset(assets.tag, grepl(pattern = paste0("^",tag2,",", "|",",", tag2,",") ,x = assets.tag$tags) )
count.pairs<-nrow(pairs)
if (count.pairs!= 0) {
l<-list(unique.tags[i],unique.tags[j],count.pairs)
df<-as.data.frame(l)
colnames(df)<-c('tag1','tag2','Freq')
pairs.df<-rbind(pairs.df,df)
}
}
}
}
pairs.df$tag1<-as.character(pairs.df$tag1)
pairs.df$tag2<-as.character(pairs.df$tag2)
# Generating JSON compatible with the force-directed graph http://bl.ocks.org/mbostock/4062045
languages<-c("de","en","it","fr","es")
languages.img<-c("https://cdn3.iconfinder.com/data/icons/finalflags/32/Germany-Flag.png",
"https://cdn3.iconfinder.com/data/icons/finalflags/32/United-Kingdom-flag.png",
"https://cdn3.iconfinder.com/data/icons/finalflags/32/Italy-Flag.png",
"https://cdn3.iconfinder.com/data/icons/finalflags/32/France-Flag.png",
"https://cdn3.iconfinder.com/data/icons/finalflags/32/Spain-Flag.png")
img.url<-"https://cdn4.iconfinder.com/data/icons/miu/22/editor_pencil_pen_edit_write_-16.png"
listNodes<-NULL
nodeNr<-0
# Genereting the Nodes
strJson<-'{ "nodes":['
for(j in 1:length(languages))
{
lang<-agg.languages[agg.languages$lang==languages[j],]
df<-data.frame(tag=languages[j],noderNr=nodeNr)
listNodes<-rbind(df,listNodes)
# entry for the language
dt<-paste0('{"name":"', languages[j],'","group":',j,', "image":"',languages.img[j] ,'", "count":', sum(lang$count) ,', "number":', nodeNr ,' },' )
nodeNr<-nodeNr+1
strJson<-paste0(strJson,dt)
for(i in 1:nrow(lang))
{
if (nrow(listNodes[listNodes$tag==lang[i,]$tag,])==0 && !(lang[i,]$tag %in% languages))
{
dt<-paste0('{"name":"', lang[i,]$tag,'","group":',j,', "image":"',img.url ,'", "count":', lang[i,]$count ,', "number":', nodeNr ,' },' )
strJson<-paste0(strJson,dt)
df<-data.frame(tag=lang[i,]$tag,noderNr=nodeNr)
listNodes<-rbind(df,listNodes)
nodeNr<-nodeNr+1
}
}
}
strJson<-substr(strJson,1,nchar(strJson)-1)
# Genereting the Links
strJson<-paste0(strJson,'], "links": [')
for (l in 1:nrow(agg.languages))
{
matches1<-subset(listNodes, tag==agg.languages[l,]$lang)
matches2<-subset(listNodes, tag==agg.languages[l,]$tag)
for (k in 1:nrow(matches1))
{
for (m in 1:nrow(matches2))
{
dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},')
strJson<-paste0(strJson,dt)
}
}
}
for (l in 1:nrow(pairs.df))
{
matches1<-subset(listNodes, tag==pairs.df[l,]$tag1)
matches2<-subset(listNodes, tag==pairs.df[l,]$tag2)
for (k in 1:nrow(matches1))
{
for (m in 1:nrow(matches2))
{
dt<-paste0('{"source":',matches1[k,]$noderNr,',"target":',matches2[m,]$noderNr,',"value": 0},')
strJson<-paste0(strJson,dt)
}
}
}
strJson<-substr(strJson,1,nchar(strJson)-1)
strJson<-paste0(strJson,']}')
# Writing it to a file
write(x = strJson, file = '~/jesuischarlie/jesuischarlie2.json') |
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.