Retrieve IP ASN & BGP Peer Info With R

February 7, 2013
By

(This article was first published on rud.is » R, and kindly contributed to R-bloggers)

This is part of a larger project I’m working on, but it’s useful enough to share (github version coming soon).

The fine folks at @TeamCymru have a great service to map IP addresses to ASN/BGP information en masse.

There are libraries for Python, Perl and other languages but none for R (that I could find). So, I threw together a quick set of functions to interface to @TeamCymru’s service. Unlike many other modern services, this one isn’t XML or JSON over a RESTful interface, so the code uses a socketConnection() over the standard WHOIS TCP port to post and retrieve simple text lists.

#
# bulkorigin.R - perform bulk IP to ASN mapping via Team Cymru whois service
#
# Author: @hrbrmstr
# Version: 0.1
# Date: 2013-02-07
#
# Copyright 2013 Bob Rudis
# 
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#   
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.#
 
library(plyr)
 
# short function to trim leading/trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
 
BulkOrigin <- function(ip.list,host="v4.whois.cymru.com",port=43) {
 
  # Retrieves BGP Origin ASN info for a list of IP addresses
  #
  # NOTE: IPv4 version
  #
  # NOTE: The Team Cymru's service is NOT a GeoIP service!
  # Do not use this function for that as your results will not
  # be accurate.
  #
  # Args:
  #   ip.list : character vector of IP addresses
  #   host: which server to hit for lookup (defaults to Team Cymru's server)
  #   post: TCP port to use (defaults to 43)
  #
  # Returns:
  #   data frame of BGP Origin ASN lookup results
 
 
  # setup query
  cmd = "begin\nverbose\n" 
  ips = paste(unlist(ip.list), collapse="\n")
  cmd = sprintf("%s%s\nend\n",cmd,ips)
 
  # setup connection and post query
  con = socketConnection(host=host,port=port,blocking=TRUE,open="r+")  
  cat(cmd,file=con)
  response = readLines(con)
  close(con)
 
  # trim header, split fields and convert results
  response = response[2:length(response)]
  response = lapply(response,function(n) {
    sapply(strsplit(n,"|",fixed=TRUE),trim)
  })  
  response = adply(response,c(1))
  response = response[,2:length(response)]
  names(response) = c("AS","IP","BGP.Prefix","CC","Registry","Allocated","AS.Name")
 
  return(response)
 
}
 
BulkPeer <- function(ip.list,host="v4-peer.whois.cymru.com",port=43) {
 
  # Retrieves BGP Peer ASN info for a list of IP addresses
  #
  # NOTE: IPv4 version
  #
  # NOTE: The Team Cymru's service is NOT a GeoIP service!
  # Do not use this function for that as your results will not
  # be accurate.
  #
  # Args:
  #   ip.list : character vector of IP addresses
  #   host: which server to hit for lookup (defaults to Team Cymru's server)
  #   post: TCP port to use (defaults to 43)
  #
  # Returns:
  #   data frame of BGP Peer ASN lookup results
 
 
  # setup query
  cmd = "begin\nverbose\n" 
  ips = paste(unlist(ip.list), collapse="\n")
  cmd = sprintf("%s%s\nend\n",cmd,ips)
 
  # setup connection and post query
  con = socketConnection(host=host,port=port,blocking=TRUE,open="r+")  
  cat(cmd,file=con)
  response = readLines(con)
  close(con)
 
  # trim header, split fields and convert results
  response = response[2:length(response)]
  response = lapply(response,function(n) {
    sapply(strsplit(n,"|",fixed=TRUE),trim)
  })  
  response = adply(response,c(1))
  response = response[,2:length(response)]
  names(response) = c("Peer.AS","IP","BGP.Prefix","CC","Registry","Allocated","Peer.AS.Name")
  return(response)
 
}

Take a list of IPs, make an IP connection, formulate a bulk query and convert the results. Here’s a small script to test it:

ips = c("100.43.81.11","100.43.81.7")
origin = BulkOrigin(ips)
str(origin)
peers = BulkPeer(ips)
str(peers)

That code outputs:

'data.frame':	2 obs. of  7 variables:
 $ AS        : chr  "13238" "13238"
 $ IP        : chr  "100.43.81.11" "100.43.81.7"
 $ BGP.Prefix: chr  "100.43.64.0/19" "100.43.64.0/19"
 $ CC        : chr  "US" "US"
 $ Registry  : chr  "arin" "arin"
 $ Allocated : chr  "2011-12-06" "2011-12-06"
 $ AS.Name   : chr  "YANDEX Yandex LLC" "YANDEX Yandex LLC"

and

'data.frame':	8 obs. of  7 variables:
 $ Peer.AS     : chr  "174" "3257" "9002" "10310" ...
 $ IP          : chr  "100.43.81.11" "100.43.81.11" "100.43.81.11" "100.43.81.11" ...
 $ BGP.Prefix  : chr  "100.43.64.0/19" "100.43.64.0/19" "100.43.64.0/19" "100.43.64.0/19" ...
 $ CC          : chr  "US" "US" "US" "US" ...
 $ Registry    : chr  "arin" "arin" "arin" "arin" ...
 $ Allocated   : chr  "2011-12-06" "2011-12-06" "2011-12-06" "2011-12-06" ...
 $ Peer.AS.Name: chr  "COGENT Cogent/PSI" "TINET-BACKBONE Tinet SpA" "RETN-AS ReTN.net Autonomous System" "YAHOO-1 - Yahoo!" ...

respectively for each str().

Nothing super-sexy, but it’s part of a mission I’m on to make IP addresses “first class citizens” in R. I’m starting with building some smaller functions that accumulate IP metadata and will ultimately collect them all into a compact R library.

In the interim, I thought these two routines might be useful to some folks.

With just these two functions, you can use various graphing libraries to get a picture of the network connectivity. Here’s a small sample to get you started:

library(igraph)
 
ips = c("100.43.81.11")
origin = BulkOrigin(ips)
peers = BulkPeer(ips)
 
g = graph.empty() + vertices(c(ips, peers$Peer.AS, origin$AS),size=30)
V(g)$label = c(ips, peers$Peer.AS, origin$AS)
e = lapply(peers$Peer.AS,function(x) {
  c(origin$AS,x)
})
g = g + edges(unlist(e))
g = g + edge(ips, origin$AS)
g$layout = layout.circle
plot(g)

asn

If you know of any other R libraries or code that provide functions that operate on IP addresses or interface to services that provide IP address metadata, please drop a note in the comments or ping me on Twitter.

To leave a comment for the author, please follow the link and comment on his blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.