Site icon R-bloggers

New in openssl 0.3: hash functions

[This article was first published on OpenCPU, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

This week version 0.3 of the openssl package appeared on CRAN. New in this release are bindings to the cryptographic hashning functions in OpenSSL. Not exactly ground breaking (hashing functions have long been available from digest) but nice to have anyway. An overview from the new vignette:

Hashing functions

The functions sha1, sha256, sha512, md4, md5 and ripemd160 bind to the respective digest functions in OpenSSL’s libcrypto. Both binary and string inputs are supported and the output type will match the input type.

library(openssl)
md5("foo")
# [1] "acbd18db4cc2f85cedef654fccc4a4d8"
md5(charToRaw("foo"))
# [1] ac bd 18 db 4c c2 f8 5c ed ef 65 4f cc c4 a4 d8

Functions are fully vectorized for the case of character vectors: a vector with n strings will return n hashes.

# Vectorized for strings
md5(c("foo", "bar", "baz"))
# [1] "acbd18db4cc2f85cedef654fccc4a4d8" "37b51d194a7513e45b56f6524f2d51f2"
# [3] "73feffa4b7f6bb68e44cf984c85f6e88"

Besides character and raw vectors we can pass a connection object (e.g. a file, socket or url). In this case the function will stream-hash the binary contents of the conection.

# Stream-hash a file
myfile <- system.file("CITATION")
md5(file(myfile))
# Hashing....
# [1] e4 4f 1b 99 e3 2f 27 e0 a7 e6 a0 0a 36 07 0e 1b

Same for URLs. The hash of the R-3.1.1-win.exe below should match the one in md5sum.txt

# Stream-hash from a network connection
md5(url("http://cran.us.r-project.org/bin/windows/base/old/3.1.1/R-3.1.1-win.exe"))
# Hashing................................................................................................................
# [1] 0b 48 29 e8 92 10 eb 6d 13 71 24 8c d0 97 d1 fc

Compare to digest

Similar functionality is also available in the digest package, but with a slightly different interface:

# Compare to digest
library(digest)
digest("foo", "md5", serialize = FALSE)
# [1] "acbd18db4cc2f85cedef654fccc4a4d8"

# Other way around
digest(cars, skip = 0)
# [1] "81919836edd7b5a422700ac32bbccd7d"
md5(serialize(cars, NULL))
# [1] 81 91 98 36 ed d7 b5 a4 22 70 0a c3 2b bc cd 7d

To leave a comment for the author, please follow the link and comment on their blog: OpenCPU.

R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.