A link that can tell more than dozens of lines of R code – what’s new in archivist?

July 28, 2016
By

(This article was first published on SmarterPoland.pl » English, and kindly contributed to R-bloggers)

Can you spot the difference between this plot:

And this one:

You are right! The latter has an embedded piece of R code.
What for?

It’s a call to a function aread from archivist – a package that manages external copies of R objects. This piece of code was added by the function addHooksToPrint(), that enriches knitr reports in links to all objects of a given class, e.g. ggplot.

You can copy this hook to your R session and you will automagically recreate this plot in your local session.

archivist::aread('pbiecek/SmarterPoland_blog/arepo/e44de65f1e56ea42d2df2598c083d1ce')

But it’s not all.
Actually here the story is just beginning.

Don’t you think, that this plot is badly annotated? It is not clear what is being presented. Something about terrorism, but for which year, are these results for all countries or there is some filtering? What is on axes? Why the author skip all these important information? Why he does not include the full R code that explains how this plot was created?

Actually, having this single link you can get answers for all these questions.

First, let’s download the plot and extract the data out of it.

pl <- archivist::aread('pbiecek/SmarterPoland_blog/arepo/e44de65f1e56ea42d2df2598c083d1ce')
digest::digest(pl$data)
## ceed21e997efd00940cdbcba497559c7

This data object is also in the repository so I can download it with the aread function.

dat <- archivist::aread('pbiecek/SmarterPoland_blog/arepo/ceed21e997efd00940cdbcba497559c7') head(dat)
#          country_txt sum_kills sum_wounds    n
# 1        Afghanistan      6208       6958 1926
# 2            Algeria        21         19   16
# 3            Bahrain         5         22   18
# 4         Bangladesh        76        695  465
# 5 Bosnia-Herzegovina         4          6    6
# 6       Burkina Faso         6          9    5

But here is the coolest part.
Having an object one can (in some cases) examine the history of this objects, i.e. check how it was created. Here is how to do this:

archivist::ahistory(md5hash = 'pbiecek/SmarterPoland_blog/arepo/ceed21e997efd00940cdbcba497559c7')

#   small_data                           [d2ad05ac3e93aeaca02f57aa4f9f58bf]
#-> dplyr::filter(iyear == "2015")       [01205474e0515ad29d3bae33ad4ba821]
#-> group_by(country_txt)                [e0d9c060107803889fbc7ffdea7a23f7]
#-> dplyr::summarise(sum_kills = sum(nkill, na.rm = TRUE), 
#                     sum_wounds = sum(nwound, na.rm = TRUE), 
#                     n = n())           [a78cf8a8e9cf10bdb1158af38422723d]
#-> dplyr::filter(sum_kills > 2, 
#                 sum_wounds > 2)        [ceed21e997efd00940cdbcba497559c7]

Now you can see what operations have been used to create data used in this plot. It’s clear how the aggregation has been done, what is the filtering condition and so on.
Also you have hashes to all objects created along the way, co you can download the partial results. This history is being recorded with an operator %a% that is working in a similar fashion to %>%.

We have the plot, now we now what is being presented, let’s change some annotations.

pl + ggtitle("Victims of terrorism in 2015\nCountries with > 2 Fatalities") + theme_bw()

The ahistory() function for remote repositories was introduced to archivist in version 2.1 (on CRAN since yesterday). Other new feature is the support for repositories in shiny applications. Now you can enrich your app in links to copies of R objects generated by shiny.
You can find more information about these and other features in the useR2016 presentation about archivist (html, video).
Or look for Marcin Kosiński talk during the european R users meeting in Poznań.

The data presented in here is just a small fraction of data from National Consortium for the Study of Terrorism and Responses to Terrorism (START) (2016) retrieved from http://www.start.umd.edu/gtd.

To leave a comment for the author, please follow the link and comment on their blog: SmarterPoland.pl » English.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)