Enabling Concerned Visitors & Ethical Security Researchers with security.txt Web Security Policies (plus analyze them at-scale with R)

[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

I’ve blogged a bit about robots.txt — the rules file that documents a sites “robots exclusion” standard that instructs web crawlers what they can and cannot do (and how frequently they should do things when they are allowed to). This is a well-known and well-defined standard, but it’s not mandatory and often ignored by crawlers and content owners alike.

There’s an emerging IETF draft for a different type of site metadata that content owners should absolutely consider adopting. This one defines “web security policies” for a given site and has much in common with robots exclusion standard, including the name (security.txt) and format (policy directives are defined with simple syntax — see Chapter 5 of the Debian Policy Manual).

One core difference is that this file is intended for humans. If you are are a general user and visit a site and notice something “off” (security-wise) or if you are an honest, honorable security researcher who found a vulnerability or weakness on a site, this security.txt file should make it easier to contact the appropriate folks at the site to help them identify and resolve security issues. The IETF abstract summarizes the intent well:

When security risks in web services are discovered by independent security researchers who understand the severity of the risk, they often lack the channels to properly disclose them. As a result, security issues may be left unreported. Security.txt defines a standard to help organizations define the process for security researchers to securely disclose security vulnerabilities.

A big change from robots.txt is where the security.txt file goes. The IETF standard is still in draft state so the location may change, but the current thinking is to have it go into /.well-known/security.txt vs being placed in the top level root (i.e. it’s not supposed to be in /security.txt). If you aren’t familiar with the .well-known directory, give RFC 5785 a read.

You can visit the general information site to find out more and install a development version of a Chrome extension that will make it easier for pull up this info in your browser if you find an issue.

Here’s the security.txt for my site:

Contact: [email protected]
Encryption: https://keybase.io/hrbrmstr/pgp_keys.asc?fingerprint=e5388172b81c210906f5e5605879179645de9399
Disclosure: Full

With that info, you know where to contact me, have the ability to encrypt your message and know that I’ll give you credit and will disclose the bugs openly.

So, Why the [R] tag?

Ah, yes. This post is in the R RSS category feed for a reason. I do at-scale analysis of the web for a living and will be tracking the adoption of security.txt across the internet (initially with the Umbrella Top 1m and a choice list of sites with more categorical data associated with them) over time. My esteemed colleague @jhartftw is handling the crawling part, but I needed a way to speedily read in these files for a broader analysis. So, I made an R package: securitytxt?.

It’s pretty easy to use. Here’s how to install it and use one of the functions to generate a security.txt target URL for a site:

devtools::install_github("hrbrmstr/securitytxt")

library(securitytxt)

(xurl <- sectxt_url("https://rud.is/b"))
## [1] "https://rud.is/.well-known/security.txt"

This is how you read in and parse a security.txt file:

(x <- sectxt(url(xurl)))
## <Web Security Policies Object>
## Contact: [email protected]
## Encryption: https://keybase.io/hrbrmstr/pgp_keys.asc?fingerprint=e5388172b81c210906f5e5605879179645de9399
## Disclosure: Full

And, this is how you turn that into a usable data frame:

sectxt_info(x)
##          key                                                                                         value
## 1    contact                                                                                    [email protected]
## 2 encryption https://keybase.io/hrbrmstr/pgp_keys.asc?fingerprint=e5388172b81c210906f5e5605879179645de9399
## 3 disclosure                                                                                          Full

There’s also a function to validate that the keys are within the current IETF standard. That will become more useful once the standard moves out of draft status.

FIN

So, definitely adopt the standard and feel invited to kick the tyres on the package. Don’t hesitate to jump on board if you have ideas for how you’d like to extend the package, and drop a note in the comments if you have questions on it or on adopting the standard for your site.

To leave a comment for the author, please follow the link and comment on their blog: R – rud.is.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Never miss an update!
Subscribe to R-bloggers to receive
e-mails with the latest R posts.
(You will not see this message again.)

Click here to close (This popup will not appear again)