Quick hit: ‘dig’-ging Into r-project.org DNS Records with {processx}

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

The r-project.org domain had some temporary technical difficulties this week (2019-29) that made reaching R-related resources problematic for a bunch of folks for a period of time. Incidents like this underscore the need for regional and network diversity when it comes to ensuring the availability of DNS services. That is, it does no good if you have two DNS servers if they’re both connected to the same power source and/or network connection since if power goes out or the network gets wonky no client will be able to translate r-project.org to an IP address that it can then connect to.

I’m not at-keyboard much this week so only had time to take an external poke at the (new) r-project.org DNS configuration late yesterday and today before the sleepyhead vacationers emerged from slumber. To my surprise, the r-project.org current DNS setup allows full zone transfers, which means you can get the full “database” of r-project.org DNS records if you know the right incantations.

So, I wrote a small R function wrapper for the dig command using {processx}. Folks on the legacy Windows operating system are on your own for getting a copy of dig installed but users of proper, modern operating systems like Linux or macOS should have it installed by-default (or will be an easy package manager grab away).

Wrapping dig

The R-wrapper for the dig command is pretty straightforward:

library(stringi) # string processing
library(processx) # system processes orchestration
library(tidyverse) # good data wrangling idioms

dig <- function(..., cat = TRUE) {

  processx::run(
    command = unname(Sys.which("dig")), 
    args = unlist(list(...)),
  ) -> out

  if (cat) message(out$stdout)

  invisible(out)

}

We expand the ellipses into command arguments, run the command, return the output and optionally display the output via message().

Let’s see if it works by getting the dig help:

dig("-h")
## Usage:  dig [@global-server] [domain] [q-type] [q-class] {q-opt}
##             {global-d-opt} host [@local-server] {local-d-opt}
##             [ host [@local-server] {local-d-opt} [...]]
## Where:  domain    is in the Domain Name System
##         q-class  is one of (in,hs,ch,...) [default: in]
##         q-type   is one of (a,any,mx,ns,soa,hinfo,axfr,txt,...) [default:a]
##                  (Use ixfr=version for type ixfr)
##         q-opt    is one of:
##                  -4                  (use IPv4 query transport only)
##                  -6                  (use IPv6 query transport only)
##                  -b address[#port]   (bind to source address/port)
##                  -c class            (specify query class)
##                  -f filename         (batch mode)
##                  -i                  (use IP6.INT for IPv6 reverse lookups)
##                  -k keyfile          (specify tsig key file)
##                  -m                  (enable memory usage debugging)
##                  -p port             (specify port number)
##                  -q name             (specify query name)
##                  -t type             (specify query type)
##                  -u                  (display times in usec instead of msec)
##                  -x dot-notation     (shortcut for reverse lookups)
##                  -y [hmac:]name:key  (specify named base64 tsig key)
##         d-opt    is of the form +keyword[=value], where keyword is:
##                  +[no]aaonly         (Set AA flag in query (+[no]aaflag))
##                  +[no]additional     (Control display of additional section)
##                  +[no]adflag         (Set AD flag in query (default on))
##                  +[no]all            (Set or clear all display flags)
##                  +[no]answer         (Control display of answer section)
##                  +[no]authority      (Control display of authority section)
##                  +[no]besteffort     (Try to parse even illegal messages)
##                  +bufsize=###        (Set EDNS0 Max UDP packet size)
##                  +[no]cdflag         (Set checking disabled flag in query)
##                  +[no]cl             (Control display of class in records)
##                  +[no]cmd            (Control display of command line)
##                  +[no]comments       (Control display of comment lines)
##                  +[no]crypto         (Control display of cryptographic fields in records)
##                  +[no]defname        (Use search list (+[no]search))
##                  +[no]dnssec         (Request DNSSEC records)
##                  +domain=###         (Set default domainname)
##                  +[no]edns[=###]     (Set EDNS version) [0]
##                  +ednsflags=###      (Set EDNS flag bits)
##                  +[no]ednsnegotiation (Set EDNS version negotiation)
##                  +ednsopt=###[:value] (Send specified EDNS option)
##                  +noednsopt          (Clear list of +ednsopt options)
##                  +[no]expire         (Request time to expire)
##                  +[no]fail           (Don't try next server on SERVFAIL)
##                  +[no]identify       (ID responders in short answers)
##                  +[no]idnout         (convert IDN response)
##                  +[no]ignore         (Don't revert to TCP for TC responses.)
##                  +[no]keepopen       (Keep the TCP socket open between queries)
##                  +[no]multiline      (Print records in an expanded format)
##                  +ndots=###          (Set search NDOTS value)
##                  +[no]nsid           (Request Name Server ID)
##                  +[no]nssearch       (Search all authoritative nameservers)
##                  +[no]onesoa         (AXFR prints only one soa record)
##                  +[no]opcode=###     (Set the opcode of the request)
##                  +[no]qr             (Print question before sending)
##                  +[no]question       (Control display of question section)
##                  +[no]recurse        (Recursive mode)
##                  +retry=###          (Set number of UDP retries) [2]
##                  +[no]rrcomments     (Control display of per-record comments)
##                  +[no]search         (Set whether to use searchlist)
##                  +[no]short          (Display nothing except short
##                                       form of answer)
##                  +[no]showsearch     (Search with intermediate results)
##                  +[no]split=##       (Split hex/base64 fields into chunks)
##                  +[no]stats          (Control display of statistics)
##                  +subnet=addr        (Set edns-client-subnet option)
##                  +[no]tcp            (TCP mode (+[no]vc))
##                  +time=###           (Set query timeout) [5]
##                  +[no]trace          (Trace delegation down from root [+dnssec])
##                  +tries=###          (Set number of UDP attempts) [3]
##                  +[no]ttlid          (Control display of ttls in records)
##                  +[no]vc             (TCP mode (+[no]tcp))
##         global d-opts and servers (before host name) affect all queries.
##         local d-opts and servers (after host name) affect only that lookup.
##         -h                           (print help and exit)
##         -v                           (print version and exit)

To get the DNS records of r-project.org DNS we need to find the nameservers, which we can do via:

ns <- dig("+short", "NS", "@9.9.9.9", "r-project.org")
## ns1.wu-wien.ac.at.
## ns2.urbanek.info.
## ns1.urbanek.info.
## ns3.urbanek.info.
## ns4.urbanek.info.
## ns2.wu-wien.ac.at.

There are six of them (which IIRC is a few more than they had earlier this week). I wanted to see if any supported zone transfers. Here’s one way to do that:

stri_split_lines(ns$stdout, omit_empty = TRUE) %>% # split the response in stdout into lines
  flatten_chr() %>% # turn the list into a character vector
  map_df(~{ # make a data frame out of the following
    tibble(
      ns = .x, # the nameserver we are probing
      res = dig("+noall", "+answer", "AXFR", glue::glue("@{.x}"), "r-project.org", cat = FALSE) %>%  # the dig zone transfer request
        pluck("stdout") # we only want the `stdout` element of the {processx} return value
    )
  }) -> xdf

xdf
## # A tibble: 6 x 2
##   ns              res                                                  
##   <chr>           <chr>                                                
## 1 ns1.wu-wien.ac… "; Transfer failed.\n"                               
## 2 ns2.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 3 ns1.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 4 ns3.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 5 ns4.urbanek.in… "R-project.org.\t\t7200\tIN\tSOA\tns0.wu-wien.ac.at.…
## 6 ns2.wu-wien.ac… "; Transfer failed.\n" 

(NOTE: You may not get things in the same order if you try this at home due to the way DNS queries and responses work.)

So, two servers did not accept our request but four did. Let’s see what a set of zone transfer records looks like:

cat(xdf[["res"]][[2]]) 
## R-project.org.    7200  IN  SOA ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800 604800 3600
## R-project.org.    7200  IN  NS  ns1.urbanek.info.
## R-project.org.    7200  IN  NS  ns1.wu-wien.ac.at.
## R-project.org.    7200  IN  NS  ns2.urbanek.info.
## R-project.org.    7200  IN  NS  ns2.wu-wien.ac.at.
## R-project.org.    7200  IN  NS  ns3.urbanek.info.
## R-project.org.    7200  IN  NS  ns4.urbanek.info.
## R-project.org.    7200  IN  A 137.208.57.37
## R-project.org.    7200  IN  MX  5 mc1.ethz.ch.
## R-project.org.    7200  IN  MX  5 mc2.ethz.ch.
## R-project.org.    7200  IN  MX  5 mc3.ethz.ch.
## R-project.org.    7200  IN  MX  5 mc4.ethz.ch.
## R-project.org.    7200  IN  TXT "v=spf1 ip4:129.132.119.208/32 ~all"
## cran.at.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## beta.R-project.org. 7200  IN  A 137.208.57.37
## bugs.R-project.org. 7200  IN  CNAME rbugs.urbanek.info.
## cran.ch.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## cloud.R-project.org.  7200  IN  CNAME d3caqzu56oq2n9.cloudfront.net.
## cran.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## ftp.cran.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## www.cran.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## cran-archive.R-project.org. 7200 IN CNAME cran.wu-wien.ac.at.
## developer.R-project.org. 7200 IN  CNAME rdevel.urbanek.info.
## cran.es.R-project.org.  7200  IN  A 137.208.57.37
## ess.R-project.org.  7200  IN  CNAME ess.math.ethz.ch.
## journal.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## mac.R-project.org.  7200  IN  CNAME r.research.att.com.
## portal.R-project.org. 7200  IN  CNAME r-project.org.
## r-forge.R-project.org.  7200  IN  CNAME r-forge.wu-wien.ac.at.
## *.r-forge.R-project.org. 7200 IN  CNAME r-forge.wu-wien.ac.at.
## search.R-project.org. 7200  IN  CNAME finzi.psych.upenn.edu.
## svn.R-project.org.  7200  IN  CNAME svn-stat.math.ethz.ch.
## translation.R-project.org. 7200 IN  CNAME translation.r-project.kr.
## cran.uk.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## cran.us.R-project.org.  7200  IN  A 137.208.57.37
## user2004.R-project.org. 7200  IN  CNAME r-project.org.
## useR2006.R-project.org. 7200  IN  CNAME r-project.org.
## user2007.R-project.org. 7200  IN  CNAME r-project.org.
## useR2008.R-project.org. 7200  IN  CNAME r-project.org.
## useR2009.R-project.org. 7200  IN  CNAME r-project.org.
## user2010.R-project.org. 7200  IN  CNAME r-project.org.
## useR2011.R-project.org. 7200  IN  CNAME r-project.org.
## useR2012.R-project.org. 7200  IN  CNAME r-project.org.
## useR2013.R-project.org. 7200  IN  CNAME r-project.org.
## user2014.R-project.org. 7200  IN  CNAME user2014.github.io.
## useR2015.R-project.org. 7200  IN  CNAME r-project.org.
## useR2016.R-project.org. 7200  IN  CNAME user2016.github.io.
## useR2017.R-project.org. 7200  IN  CNAME r-project.org.
## useR2018.R-project.org. 7200  IN  CNAME user-2018.netlify.com.
## useR2019.R-project.org. 7200  IN  A 5.135.185.16
## wiki.R-project.org. 7200  IN  CNAME cran.wu-wien.ac.at.
## win-builder.R-project.org. 7200 IN  A 129.217.207.166
## win-builder.R-project.org. 7200 IN  MX  0 rdevel.urbanek.info.
## www.R-project.org.  7200  IN  CNAME cran.wu-wien.ac.at.
## R-project.org.    7200  IN  SOA ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800 604800 3600

That’s not pretty, but it’s wrangle-able. Let’s turn it into a data frame:

xdf[["res"]][[2]] %>% # get the response text
  stri_split_lines(omit_empty = TRUE) %>%  # split it into lines
  flatten_chr() %>%  # turn it into a character vector
  stri_split_regex("[[:space:]]+", n = 5, simplify = TRUE) %>% # split at whitespace, limiting to five fields
  as_tibble(.name_repair = "unique") %>% # make it a tibble
  set_names(c("host", "ttl", "class", "record_type", "value")) %>% # better colnames
  mutate(host = stri_trans_tolower(host)) %>% # case matters not in DNS names
  print(n=nrow(.)) # see our results
## # A tibble: 55 x 5
##    host                     ttl   class record_type value                                                              
##    <chr>                    <chr> <chr> <chr>       <chr>                                                              
##  1 r-project.org.           7200  IN    SOA         ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800 …
##  2 r-project.org.           7200  IN    NS          ns1.urbanek.info.                                                  
##  3 r-project.org.           7200  IN    NS          ns1.wu-wien.ac.at.                                                 
##  4 r-project.org.           7200  IN    NS          ns2.urbanek.info.                                                  
##  5 r-project.org.           7200  IN    NS          ns2.wu-wien.ac.at.                                                 
##  6 r-project.org.           7200  IN    NS          ns3.urbanek.info.                                                  
##  7 r-project.org.           7200  IN    NS          ns4.urbanek.info.                                                  
##  8 r-project.org.           7200  IN    A           137.208.57.37                                                      
##  9 r-project.org.           7200  IN    MX          5 mc1.ethz.ch.                                                     
## 10 r-project.org.           7200  IN    MX          5 mc2.ethz.ch.                                                     
## 11 r-project.org.           7200  IN    MX          5 mc3.ethz.ch.                                                     
## 12 r-project.org.           7200  IN    MX          5 mc4.ethz.ch.                                                     
## 13 r-project.org.           7200  IN    TXT         "\"v=spf1 ip4:129.132.119.208/32 ~all\""                           
## 14 cran.at.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 15 beta.r-project.org.      7200  IN    A           137.208.57.37                                                      
## 16 bugs.r-project.org.      7200  IN    CNAME       rbugs.urbanek.info.                                                
## 17 cran.ch.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 18 cloud.r-project.org.     7200  IN    CNAME       d3caqzu56oq2n9.cloudfront.net.                                     
## 19 cran.r-project.org.      7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 20 ftp.cran.r-project.org.  7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 21 www.cran.r-project.org.  7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 22 cran-archive.r-project.… 7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 23 developer.r-project.org. 7200  IN    CNAME       rdevel.urbanek.info.                                               
## 24 cran.es.r-project.org.   7200  IN    A           137.208.57.37                                                      
## 25 ess.r-project.org.       7200  IN    CNAME       ess.math.ethz.ch.                                                  
## 26 journal.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 27 mac.r-project.org.       7200  IN    CNAME       r.research.att.com.                                                
## 28 portal.r-project.org.    7200  IN    CNAME       r-project.org.                                                     
## 29 r-forge.r-project.org.   7200  IN    CNAME       r-forge.wu-wien.ac.at.                                             
## 30 *.r-forge.r-project.org. 7200  IN    CNAME       r-forge.wu-wien.ac.at.                                             
## 31 search.r-project.org.    7200  IN    CNAME       finzi.psych.upenn.edu.                                             
## 32 svn.r-project.org.       7200  IN    CNAME       svn-stat.math.ethz.ch.                                             
## 33 translation.r-project.o… 7200  IN    CNAME       translation.r-project.kr.                                          
## 34 cran.uk.r-project.org.   7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 35 cran.us.r-project.org.   7200  IN    A           137.208.57.37                                                      
## 36 user2004.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 37 user2006.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 38 user2007.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 39 user2008.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 40 user2009.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 41 user2010.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 42 user2011.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 43 user2012.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 44 user2013.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 45 user2014.r-project.org.  7200  IN    CNAME       user2014.github.io.                                                
## 46 user2015.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 47 user2016.r-project.org.  7200  IN    CNAME       user2016.github.io.                                                
## 48 user2017.r-project.org.  7200  IN    CNAME       r-project.org.                                                     
## 49 user2018.r-project.org.  7200  IN    CNAME       user-2018.netlify.com.                                             
## 50 user2019.r-project.org.  7200  IN    A           5.135.185.16                                                       
## 51 wiki.r-project.org.      7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 52 win-builder.r-project.o… 7200  IN    A           129.217.207.166                                                    
## 53 win-builder.r-project.o… 7200  IN    MX          0 rdevel.urbanek.info.                                             
## 54 www.r-project.org.       7200  IN    CNAME       cran.wu-wien.ac.at.                                                
## 55 r-project.org.           7200  IN    SOA         ns0.wu-wien.ac.at. postmaster.wu-wien.ac.at. 2019040400 3600 1800

FIN

Zone transfers are a quick way to get all the DNS information for a site. As such, it isn’t generally recommended to allow zone transfers from just anyone (though trying to keep anything secret in public DNS is a path generally fraught with peril given how easy it is to brute-force record lookups). However, if r-project.org zone transfers stay generally open, then you can use this method to keep a local copy of r-project.org host info and make local /etc/hosts (or the Windows equivalent) entries when issues like the one this past week arise.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)