Retrieve IP ASN & BGP Peer Info With R

[This article was first published on rud.is » R, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This is part of a larger project I’m working on, but it’s useful enough to share (github version coming soon).

The fine folks at @TeamCymru have a great service to map IP addresses to ASN/BGP information en masse.

There are libraries for Python, Perl and other languages but none for R (that I could find). So, I threw together a quick set of functions to interface to @TeamCymru’s service. Unlike many other modern services, this one isn’t XML or JSON over a RESTful interface, so the code uses a socketConnection() over the standard WHOIS TCP port to post and retrieve simple text lists.

#
# bulkorigin.R - perform bulk IP to ASN mapping via Team Cymru whois service
#
# Author: @hrbrmstr
# Version: 0.1
# Date: 2013-02-07
#
# Copyright 2013 Bob Rudis
# 
# Permission is hereby granted, free of charge, to any person obtaining
# a copy of this software and associated documentation files (the
# "Software"), to deal in the Software without restriction, including
# without limitation the rights to use, copy, modify, merge, publish,
# distribute, sublicense, and/or sell copies of the Software, and to
# permit persons to whom the Software is furnished to do so, subject to
# the following conditions:
#   
# The above copyright notice and this permission notice shall be
# included in all copies or substantial portions of the Software.
# 
# THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND,
# EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
# MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND
# NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE
# LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
# WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.#
 
library(plyr)
 
# short function to trim leading/trailing whitespace
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
 
BulkOrigin <- function(ip.list,host="v4.whois.cymru.com",port=43) {
 
  # Retrieves BGP Origin ASN info for a list of IP addresses
  #
  # NOTE: IPv4 version
  #
  # NOTE: The Team Cymru's service is NOT a GeoIP service!
  # Do not use this function for that as your results will not
  # be accurate.
  #
  # Args:
  #   ip.list : character vector of IP addresses
  #   host: which server to hit for lookup (defaults to Team Cymru's server)
  #   post: TCP port to use (defaults to 43)
  #
  # Returns:
  #   data frame of BGP Origin ASN lookup results
 
 
  # setup query
  cmd = "begin\nverbose\n" 
  ips = paste(unlist(ip.list), collapse="\n")
  cmd = sprintf("%s%s\nend\n",cmd,ips)
 
  # setup connection and post query
  con = socketConnection(host=host,port=port,blocking=TRUE,open="r+")  
  cat(cmd,file=con)
  response = readLines(con)
  close(con)
 
  # trim header, split fields and convert results
  response = response[2:length(response)]
  response = lapply(response,function(n) {
    sapply(strsplit(n,"|",fixed=TRUE),trim)
  })  
  response = adply(response,c(1))
  response = response[,2:length(response)]
  names(response) = c("AS","IP","BGP.Prefix","CC","Registry","Allocated","AS.Name")
 
  return(response)
 
}
 
BulkPeer <- function(ip.list,host="v4-peer.whois.cymru.com",port=43) {
 
  # Retrieves BGP Peer ASN info for a list of IP addresses
  #
  # NOTE: IPv4 version
  #
  # NOTE: The Team Cymru's service is NOT a GeoIP service!
  # Do not use this function for that as your results will not
  # be accurate.
  #
  # Args:
  #   ip.list : character vector of IP addresses
  #   host: which server to hit for lookup (defaults to Team Cymru's server)
  #   post: TCP port to use (defaults to 43)
  #
  # Returns:
  #   data frame of BGP Peer ASN lookup results
 
 
  # setup query
  cmd = "begin\nverbose\n" 
  ips = paste(unlist(ip.list), collapse="\n")
  cmd = sprintf("%s%s\nend\n",cmd,ips)
 
  # setup connection and post query
  con = socketConnection(host=host,port=port,blocking=TRUE,open="r+")  
  cat(cmd,file=con)
  response = readLines(con)
  close(con)
 
  # trim header, split fields and convert results
  response = response[2:length(response)]
  response = lapply(response,function(n) {
    sapply(strsplit(n,"|",fixed=TRUE),trim)
  })  
  response = adply(response,c(1))
  response = response[,2:length(response)]
  names(response) = c("Peer.AS","IP","BGP.Prefix","CC","Registry","Allocated","Peer.AS.Name")
  return(response)
 
}

Take a list of IPs, make an IP connection, formulate a bulk query and convert the results. Here’s a small script to test it:

ips = c("100.43.81.11","100.43.81.7")
origin = BulkOrigin(ips)
str(origin)
peers = BulkPeer(ips)
str(peers)

That code outputs:

'data.frame':	2 obs. of  7 variables:
 $ AS        : chr  "13238" "13238"
 $ IP        : chr  "100.43.81.11" "100.43.81.7"
 $ BGP.Prefix: chr  "100.43.64.0/19" "100.43.64.0/19"
 $ CC        : chr  "US" "US"
 $ Registry  : chr  "arin" "arin"
 $ Allocated : chr  "2011-12-06" "2011-12-06"
 $ AS.Name   : chr  "YANDEX Yandex LLC" "YANDEX Yandex LLC"

and

'data.frame':	8 obs. of  7 variables:
 $ Peer.AS     : chr  "174" "3257" "9002" "10310" ...
 $ IP          : chr  "100.43.81.11" "100.43.81.11" "100.43.81.11" "100.43.81.11" ...
 $ BGP.Prefix  : chr  "100.43.64.0/19" "100.43.64.0/19" "100.43.64.0/19" "100.43.64.0/19" ...
 $ CC          : chr  "US" "US" "US" "US" ...
 $ Registry    : chr  "arin" "arin" "arin" "arin" ...
 $ Allocated   : chr  "2011-12-06" "2011-12-06" "2011-12-06" "2011-12-06" ...
 $ Peer.AS.Name: chr  "COGENT Cogent/PSI" "TINET-BACKBONE Tinet SpA" "RETN-AS ReTN.net Autonomous System" "YAHOO-1 - Yahoo!" ...

respectively for each str().

Nothing super-sexy, but it’s part of a mission I’m on to make IP addresses “first class citizens” in R. I’m starting with building some smaller functions that accumulate IP metadata and will ultimately collect them all into a compact R library.

In the interim, I thought these two routines might be useful to some folks.

With just these two functions, you can use various graphing libraries to get a picture of the network connectivity. Here’s a small sample to get you started:

library(igraph)
 
ips = c("100.43.81.11")
origin = BulkOrigin(ips)
peers = BulkPeer(ips)
 
g = graph.empty() + vertices(c(ips, peers$Peer.AS, origin$AS),size=30)
V(g)$label = c(ips, peers$Peer.AS, origin$AS)
e = lapply(peers$Peer.AS,function(x) {
  c(origin$AS,x)
})
g = g + edges(unlist(e))
g = g + edge(ips, origin$AS)
g$layout = layout.circle
plot(g)

asn

If you know of any other R libraries or code that provide functions that operate on IP addresses or interface to services that provide IP address metadata, please drop a note in the comments or ping me on Twitter.

To leave a comment for the author, please follow the link and comment on their blog: rud.is » R.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)