Tweeting wikidata info

November 18, 2018
By

(This article was first published on Category R on Roel's R-tefacts, and kindly contributed to R-bloggers)

In this explainer I walk you through the steps I took to create a twitter bot
that tweets daily about people who died on that date.

I created a script that queries wikidata, takes that information and creates
a sentence. That sentence is then tweeted.

For example:

A tweet I literally just send out from the docker container

A tweet I literally just send out from the docker container

I hope you are has excited as I am about this project. Here it comes!

There are 3 parts:

  1. Talk to wikidata and retrieve information about 10 people that died today
  2. Grab one of the deaths and create a sentence
  3. Post that sentence to twitter in the account wikidatabot
  4. Throw it all into a docker container so it can run on the computer of someone else (AKA: THA CLOUD)

You might wonder, why people who died? To which I answer, emphatically but not
really helpfully: ‘valar morghulis’.

1. Talk to wikidata and retrieve information

I think wikidata is one of the coolest knowledge bases in the world, it contains
facts about people, other animals, places, and the world. It powers many boxes
you see in Wikipedia pages. For instance this random page about Charles the first has a box on the
right that says something about his ancestors, successors and coronation.
The same information can be displayed in Dutch.
This is very cool and saves Wikipedia a lot of work. However, we can also use it!

You can create your own query about the world in the query editor. But it is quite hard to figure out how to do that. These queries need to made in
a specific way. I just used an example from wikidata: ‘who’s birthday is it today?’
and modified it to search for people’s death (that’s how I learn, modify
something and see if I broke it). It looks a lot like SQL, but is slightly different.

Of course this editor is nice for us humans, but we want the computer to do it
so we can send a query to wikidata. I was extremely lazy and used the
WikidataQueryServiceR created by wiki-guru Mikhail Popov [@bearlogo](https://twitter.com/bearloga).

This is the query I ended up using (It looks very much like the birthdays one
but with added information):

querystring <- 
'SELECT # what variables do you want to return (defined later on)
  ?entityLabel (YEAR(?date) AS ?year) 
  ?cause_of_deathLabel 
  ?place_of_deathLabel 
  ?manner_of_deathLabel  
  ?country_of_citizenshipLabel 
  ?country_of_birth
  ?date_of_birth
WHERE {
  BIND(MONTH(NOW()) AS ?nowMonth) # this is a very cool trick
  BIND(DAY(NOW()) AS ?nowDay)
  ?entity wdt:P570 ?date.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
  ?entity wdt:P509 ?cause_of_death.
  OPTIONAL { ?entity wdt:P20 ?place_of_death. }
  OPTIONAL { ?entity wdt:P1196 ?manner_of_death. }
  FILTER(((MONTH(?date)) = ?nowMonth) && ((DAY(?date)) = ?nowDay))
  OPTIONAL { ?entity wdt:P27 ?country_of_citizenship. }
  OPTIONAL { ?entity wdt:p19 ?country_of_birth}
  OPTIONAL { ?entity wdt:P569 ?date_of_birth.}
}
LIMIT 10'

Try this in the query editor

When I created this blog post (every day the result will be different)
the result looked like this:

library(WikidataQueryServiceR)
result <- query_wikidata(querystring)
## 10 rows were returned by WDQS
result[1:3,1:3]# first 3 rows, first 3 columsn
##                  entityLabel year cause_of_deathLabel
## 1 Rafael García-Plata y Osma 1918           influenza
## 2        Dobroslava Menclová 1978       traffic crash
## 3             Alan J. Pakula 1998       traffic crash

The query returns name, year, cause of death, manner of death
(didn’t know which one to use), place of death, country of citizenship, country
of birth and date of birth.
I can now glue all these parts together to create a sentence of sorts

2. grab one of the deaths and create a sentence

I will use glue to make text, but the paste functions from base R is also fine.

These are the first lines for instance:

library(glue)
glue_data(result[1:2,], "Today in {year} in the place {place_of_deathLabel} died {entityLabel} with cause: {cause_of_deathLabel}. {entityLabel} was born on {as.Date(date_of_birth, '%Y-%m-%d')}. Find more info on this cause of death on www.en.wikipedia.org/wiki/{cause_of_deathLabel}.  #wikidata")
## Today in 1918 in the place Cáceres died Rafael García-Plata y Osma with cause: influenza. Rafael García-Plata y Osma was born on 1870-03-04. Find more info on this cause of death on www.en.wikipedia.org/wiki/influenza.  #wikidata
## Today in 1978 in the place Plzeň died Dobroslava Menclová with cause: traffic crash. Dobroslava Menclová was born on 1904-01-02. Find more info on this cause of death on www.en.wikipedia.org/wiki/traffic crash.  #wikidata

Post that sentence to twitter in the account wikidatabot

I created the twitter account wikidatabot and
added pictures 2fa and some bio information. I wanted to make it clear that it
was a bot. To post something on your behalf on twitter requires a developers
account. Go to https://developer.twitter.com and create that account. In my case
I had to manually verify twice because apparently everything I did screamed bot activity
to twitter (they were not entirely wrong). You have to sign some boxes,
acknowledge the code of conduct and understand twitter’s terms.

The next step is to create a twitter app but I will leave that explanation to
rtweet, because that vignette is very
very helpful.

When you’re done, you can post to twitter on your account with the help of
a consumer key, access key, consumer token and access token. You will need them
all and you will have to keep them a secret (or other people can post on your
account, and that is something you really don’t want).

With those secrets and the rtweet package you can create a token that enables
you to post to twitter.

And it is seriously as easy as:

rtweet::post_tweet(status = tweettext, token = token )
Again the same tweet

Again the same tweet

4 Throw it all into a docker container

I want to post this every day but to make it run in the cloud it would be nice
if R and the code would be nicely packed together. That is where docker comes in,
you can define what packages you want and a mini operating system is created
that will run for everyone on ever computer (if they have docker).
The whole example script and docker file can be found here on github.

And that’s it. If you have suggestions on how to run it every day in the cloud
for cheap, let me know by twitter or by opening an issue on github.

Things that could be done better:

  • I can run the container, but I don’t know how to make it run in the cloud
  • I ask for 10 deaths and pick one randomly, I don’t know if there is a random function in sparql
  • I put the (twitter) keys into the script, it would be better to use environment variables for that
  • rtweet and WikidataQueryServiceR have lots of dependencies that make the docker container difficult to build (mostly time consuming)
  • I guess I could just build the query and post to wikidata, but using WikidataQueryServiceR was much faster
  • I wish I knew how to use the rocker:tidyverse container to run a script, but I haven’t figured that out yet

State of the machine

At the moment of creation (when I knitted this document ) this was the state of my machine: click here to expand

sessioninfo::session_info()
## ─ Session info ──────────────────────────────────────────────────────────
##  setting  value                       
##  version  R version 3.5.1 (2018-07-02)
##  os       Ubuntu 16.04.5 LTS          
##  system   x86_64, linux-gnu           
##  ui       X11                         
##  language en_US                       
##  collate  en_US.UTF-8                 
##  tz       Europe/Amsterdam            
##  date     2018-11-19                  
## 
## ─ Packages ──────────────────────────────────────────────────────────────
##  package               * version date       source         
##  backports               1.1.2   2017-12-13 CRAN (R 3.5.0) 
##  blogdown                0.8     2018-07-15 CRAN (R 3.5.1) 
##  bookdown                0.7     2018-02-18 CRAN (R 3.5.0) 
##  clisymbols              1.2.0   2017-05-21 CRAN (R 3.5.0) 
##  crayon                  1.3.4   2017-09-16 CRAN (R 3.5.0) 
##  curl                    3.2     2018-03-28 CRAN (R 3.5.0) 
##  digest                  0.6.15  2018-01-28 CRAN (R 3.5.0) 
##  evaluate                0.11    2018-07-17 CRAN (R 3.5.1) 
##  glue                  * 1.3.0   2018-07-17 CRAN (R 3.5.1) 
##  htmltools               0.3.6   2017-04-28 CRAN (R 3.5.0) 
##  httr                    1.3.1   2017-08-20 CRAN (R 3.5.0) 
##  knitr                   1.20    2018-02-20 CRAN (R 3.5.0) 
##  magrittr                1.5     2014-11-22 CRAN (R 3.5.0) 
##  R6                      2.2.2   2017-06-17 CRAN (R 3.5.0) 
##  Rcpp                    0.12.18 2018-07-23 cran (@0.12.18)
##  rmarkdown               1.10    2018-06-11 CRAN (R 3.5.0) 
##  rprojroot               1.3-2   2018-01-03 CRAN (R 3.5.0) 
##  sessioninfo             1.0.0   2017-06-21 CRAN (R 3.5.1) 
##  stringi                 1.2.4   2018-07-20 cran (@1.2.4)  
##  stringr                 1.3.1   2018-05-10 CRAN (R 3.5.0) 
##  WikidataQueryServiceR * 0.1.1   2017-04-28 CRAN (R 3.5.1) 
##  withr                   2.1.2   2018-03-15 CRAN (R 3.5.0) 
##  xfun                    0.3     2018-07-06 CRAN (R 3.5.1) 
##  yaml                    2.2.0   2018-07-25 CRAN (R 3.5.1)

To leave a comment for the author, please follow the link and comment on their blog: Category R on Roel's R-tefacts.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Search R-bloggers


Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)