The New and Improved R Shodan Package

August 7, 2015
By

(This article was first published on Data Driven Security, and kindly contributed to R-bloggers)

For those not involved with all things “cyber”, let me start with a description of what Shodan is (though visiting the site is probably the best introduction to what secrets it holds).

Shodan is—at it’s core—a search engine. Unlike Google, Shodan indexes what I’ll call “cyber” metadata and content about everything accessible via a public IP address. This means things like

  • routers, switches and cable/DSL/FiOS modems (which are the underpinnings of our innternet access)
  • internet web, ftp, mail, etc servers
  • public (protected or otherwise) CCTV & home surveillance & web camears
  • desktops, printers and other things that may end up in public IP space
  • gas station pumps and industrial control systems
  • VoIP phones & more

Shodan contacts the IP addresses associated with all the devices, sees what ports and protocols might be in use and then tries to retrieve content from those ports and protocols (which could be anything from webcam snapshots to web server HTML to actual header responses from internet servers to banners from routers and switches). It indexes all that metadata and content and makes it available in a search engine and API for securiy researchers (I was so tempted to put that word in quotes).

To give you an idea what it can do, take a look at this query for webcams and/or read this full explanation of what you can do with that data.

While you can have fun with Shodan, it does have real value to security folk and R needed a real API interface to it (I did a half-hearted one a couiple years ago). Hence the rebirth of the shodan package.

The package is brand-new, but it has basic, full coverage of the Shodan API except for the streaming functions. But, a line of code is worth a thousand blatherings, so let’s find all the IIS servers in Maine.

# devtools::install_github("hrbrmstr/shodan")
library(shodan)

# perform the query for IIS servers in Maine
maine_iis <- shodan_search("iis state:me")

# get the total number of IIS servers in Maine that Shodan found
print(maine_iis$total) 
## [1] 2948

# how many did it return in this page of the query?
print(nrow(maine_iis$matches))
## [1] 100

# what else does it know about these servers?
print(colnames(maine_iis$matches))

##  [1] "product"   "hostnames" "version"   "title"     "ip"        "org"      
##  [7] "isp"       "cpe"       "data"      "asn"       "port"      "transport"
## [13] "timestamp" "domains"   "ip_str"    "os"        "_shodan"   "location" 
## [19] "ssl"       "link"

Now, the data frame in maine_iis$matches is somewhat ugly for the moment. Some columns have lists and data frames since the Shodan REST API returns (like many APIs do) nested JSON. I’m actually looking for collaboration on what would be the most useful format for the returned data structures so hit me up if you have ideas that would benefit your use of it.

I’ll violate my own rule about mapping IP addresses just to show you Shodan also does geolocation for you (and, hey, y’all seem to like maps). We’ll make it a bit more useful and add some metadata about what it found to the location popups:

library(leaflet)
library(htmltools)

for_map <- cbind.data.frame(maine_iis$matches$location, 
                            ip=maine_iis$matches$ip,
                            isp=maine_iis$matches$isp,
                            title=maine_iis$matches$title,
                            org=maine_iis$matches$org,
                            data=maine_iis$matches$data,
                            stringsAsFactors=FALSE)

leaflet(for_map, width="600", height="600") %>% 
  addTiles() %>% 
  setView(-69.233328, 45.250556, 7) %>% 
  addCircles(data=for_map, lng=~longitude , lat=~latitude, 
             popup=~sprintf("%s
%s, Maine
ISP: %s

%snn%s", 
                            htmlEscape(org), htmlEscape(city), htmlEscape(isp), 
                            htmlEscape(title), htmlEscape(data)))


IIS Servers in Maine

Remember that’s only 100 of ~3,000 servers, but it should give you an idea of the types of data Shodan can return.

The pacakge is up on github for now, and here’s a list of functions it makes available:

  • account_profile: Account Profile
  • api_info: API Plan Information
  • host_count: Search Shodan without Results
  • host_info: Host Information
  • my_ip: My IP Address
  • query_tags: List the most popular tags
  • resolve: DNS Lookup
  • reverse: Reverse DNS Lookup
  • shodan_api_key: Get or set SHODAN_API_KEY value
  • shodan_exploit_search: Search for Exploits
  • shodan_exploit_search_count: Search for Exploits without Results
  • shodan_ports: List all ports that Shodan is crawling on the Internet.
  • shodan_protocols: List all protocols that can be used when performing on-demand Internet scans via Shodan.
  • shodan_query_list: List the saved search queries
  • shodan_query_search: Search the directory of saved search queries.
  • shodan_scan: Request Shodan to crawl an IP/ netblock
  • shodan_scan_internet: Crawl the Internet for a specific port and protocol using Shodan
  • shodan_search: Search Shodan
  • shodan_search_tokens: Break the search query into tokens
  • shodan_services: List all services that Shodan crawls

Each of those maps to the API endpoints described on the official Shodan site.

You are invited to tag along on this package as much or as little as you like. Drop a note in the comments if you find it useful or have suggestions! Please file all feature requests or problems on github. Have fun exporing the API in R!.

To leave a comment for the author, please follow the link and comment on their blog: Data Driven Security.

R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...



If you got this far, why not subscribe for updates from the site? Choose your flavor: e-mail, twitter, RSS, or facebook...

Comments are closed.

Sponsors

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)