Cryptographic Hashing in R

The functions sha1, sha256, sha512, md4, md5 and ripemd160 bind to the respective digest functions in OpenSSL’s libcrypto. Both binary and string inputs are supported and the output type will match the input type.

md5("foo")
[1] "acbd18db4cc2f85cedef654fccc4a4d8"
md5(charToRaw("foo"))
md5 ac:bd:18:db:4c:c2:f8:5c:ed:ef:65:4f:cc:c4:a4:d8 

Functions are fully vectorized for the case of character vectors: a vector with n strings will return n hashes.

# Vectorized for strings
md5(c("foo", "bar", "baz"))
[1] "acbd18db4cc2f85cedef654fccc4a4d8" "37b51d194a7513e45b56f6524f2d51f2"
[3] "73feffa4b7f6bb68e44cf984c85f6e88"

Besides character and raw vectors we can pass a connection object (e.g. a file, socket or url). In this case the function will stream-hash the binary contents of the connection.

# Stream-hash a file
myfile <- system.file("CITATION")
md5(file(myfile))
md5 48:9e:35:00:38:d0:47:ad:99:03:b8:c5:35:d3:ec:e7 

Same for URLs. The hash of the R-installer.exe below should match the one in md5sum.txt

# Stream-hash from a network connection
as.character(md5(url("https://cran.r-project.org/bin/windows/base/old/4.0.0/R-4.0.0-win.exe")))

# Compare
readLines('https://cran.r-project.org/bin/windows/base/old/4.0.0/md5sum.txt')

Compare to digest

Similar functionality is also available in the digest package, but with a slightly different interface:

# Compare to digest
library(digest)

Attaching package: 'digest'
The following object is masked from 'package:openssl':

    sha1
digest("foo", "md5", serialize = FALSE)
[1] "acbd18db4cc2f85cedef654fccc4a4d8"
# Other way around
digest(cars, skip = 0)
[1] "a74ce6c10413c19dc0ce4c131afec221"
md5(serialize(cars, NULL))
md5 3d:bb:47:09:04:79:07:c6:71:76:16:8f:3e:d0:11:e1