Visualizing 15k Instagram Posts with TrelliscopeJS

[This article was first published on Ryan Hafen, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This post shows a simple example of creating an interactive display that allows you to navigate thousands of instagram posts with just a few lines of code using TrelliscopeJS. The example comes from a hackathon in the DARPA XDATA program earlier this year.

If you missed the announcement about TrelliscopeJS, see here for more background.

Background

I’ve been involved in the DARPA XDATA program for the past few years which has been sponsoring the development of TrelliscopeJS and other projects in DeltaRho organization. XDATA has had several hackathons this year. At one of them, the goal was to use several data sources, including social media, to try to find evidence of violations of a ceasefire agreement in Yemen. This provided a great use case for TrelliscopeJS as a utility for quickly creating an interactive display for exploring social media posts.

Data

One of the datasets provided was metadata for public instagram posts made in Yemen. The initial set of posts was large, but they were whittled down to posts where the caption or comments contained some key words, including (the arabic equivalents of) ‘advance’, ‘al-Qaeda’, ‘artillery’, ‘attack’, ‘battalion’, ‘bomb’, ‘brigade’, ‘Brigadier General’, ‘camp’, ‘car bomb’, ‘clashes’, ‘colonel’, ‘company’, ‘Houthis’, ‘ISIS’, ‘Major General’, ‘massacre’, ‘mortar’, ‘operation’, ‘plane’, ‘regiment’, ‘Saudi’, ‘Saudi Arabia’, ‘soldier’, ‘violent’, ‘warplane’, and ‘Yemen’. Filtering to the time range of interest and public posts only containing these words, we ended up with a data frame that looks like this:

dplyr::glimpse(instadf)

Observations: 14,909
Variables: 15
$ image_id       <chr> "1175063814800865121_839721980", "11349722586818...
$ caption        <chr> "#صباح_الورد #صباح_الخير #صباح_النور #صباحكم_سعا...
$ created_time   <dttm> 2016-01-31 19:49:04, 2015-12-07 12:14:18, 2015-...
$ username       <chr> "my_blbl", "ma7fouz7akimi", "tareqmusaw", "tareq...
$ userid         <chr> "839721980", "1380532560", "333728029", "3337280...
$ lat            <dbl> 13.83330, 15.30576, 15.35472, 15.31457, 17.49170...
$ lon            <dbl> 44.68330, 44.19210, 44.20667, 44.18173, 44.13220...
$ location_name  <chr> "Sanaa, Yemen", "مدينة عدن / ツ ADEN City", "ACA...
$ likes_count    <int> 23, 23, 196, 200, 45, 300, 5, 6, 68, 546, 36, 11...
$ comments_count <int> 4, 1, 10, 9, 3, 2, 1, 1, 5, 55, 1, 0, 32, 15, 6,...
$ all_comments   <chr> "يسعد صبااااحك بكل خير | @hh_a18 وصباحك اسعد اخت...
$ image_link     <chr> "https://scontent.cdninstagram.com/t51.2885-15/s...
$ post_link      <chr> "https://www.instagram.com/p/BBOqlBVnvdh/", "htt...
$ keywords       <chr> "Yemen", "Yemen", "Yemen", "Yemen", "company", "...
$ n_keywords     <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, ...

The variable names are pretty self-explanatory. We have data for nearly 15k posts.

Making a display of posts

If you have familiarized yourself with TrelliscopeJS, you might realize that this data is already in good shape to be used to create a Trelliscope display of instagram posts, where each row represents one post and each variable can be used in different ways to navigate the space of posts. All we need is a variable denoting what to plot for each post. In this case, we have a variable image_link, which provides a URL to the media shared with the instagram post – the perfect and logical thing to “visualize” in this case.

TrelliscopeJS has a function img_panel(), which allows us to cast image_link as a panel variable, and will cause the viewer to display the contents of the URL.

We can create a simple display with the following:

library(trellicopejs)
library(dplyr)

instadf %>%
  mutate(image_link = img_panel(image_link)) %>%
  arrange(-likes_count) %>%
  trelliscope(name = "posts", width = 320, height = 320, nrow = 3, ncol = 6,
    state = list(labels = c("caption", "post_link", "likes_count")))

Here in the mutate() we are update the image_link variable to be an image panel, then we sort the data in decreasing order of number of “likes” (which will set the default sort order of the display), and then pass the resulting data frame into trelliscope(). It’s as simple as that. The state argument to trelliscope() tells the viewer what labels to show under the panels by default.

Note that this runs very quickly, in under 3 seconds on my machine. This is because we do not need to generate the actual panels in this case – we are simply pointing to where the panel images reside elsewhere on the web.

The resulting display can be interacted with below, and a dedicated link to the application is available here.

Go ahead and try interacting with the display and see if you can find anything noteworthy.

Some things to try / investigate:

  • Filter on keyword “bomb” and look around – do you see any posts that appear to legimately be talking about a bombing that seems to have happened at or around the time of posting?
  • What users post the most and what do their posts look like?
  • Are there commonalities across posts at specific location_names?
  • Use the post_link label to open the original Instagram post in a new window. There you can more easily see the full content of comments, etc., and can even use Google Translate on the page (although in my experience the translations are terrible)

Filtering based on the caption or comments is also potentially very interesting if you happen to know Arabic (I don’t).

Notes

First of all, please note that all the data and social media used in this example is entirely public. The goal for the analyses done with these data was pursuing the peace process – nothing nefarious or creepy. Also note that while Instagram is supposed to root out offensive content, they don’t catch it all and I haven’t looked at all the posts so I don’t know if there will be any surprises.

This is an interesting display in that it is larger than the other examples I’ve shown of TrelliscopeJS so far. The application handles this many panels smoothly. Some of the variables (particularly caption and comments) can be very long, causing the size of this 15k-row dataset to be larger than a more traditional one.

This is just a single plot – it is not intented to illustrate any major finding or result with respect to the hackathon goal, but like all exploratory visualizations, it provides a nice basis for further exploration or as a reference when studying other aspects of the data. There are a lot of potentially interesting analyses that could be done using the original data frame that could result in new variables or modes for new uses of the display. For example, one might do an object or context detection on the images using something like Google Vision, and then you could use the results in a display to visually verify the accuracy of the machine learning outputs.

Things to do

This example highlights the need for some obvious additional filter tools in TrelliscopeJS, namely date / time filters and map filters for geographic coordinates. All in due time…

To leave a comment for the author, please follow the link and comment on their blog: Ryan Hafen.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)