MD5 Implementation #8272

DaGenix · 2013-08-03T22:33:47Z

An MD5 implementation was originally included in #8097, but, since there are a couple different implementations of that digest algorithm (@alco mentioned his implementation on the mailing list just before I opened that PR), it was suggested that I remove it from that PR and open up a new PR to discuss the different implementations and the best way forward. If anyone wants to discuss a different implementation, feel free to present it here and discuss and compare it to this one. I'll just discuss my implementation and I'll leave it to others to present details of theirs.

This implementation relies on the FixedBuffer struct from cryptoutil.rs for managing the input buffer, just like the Sha1 and Sha2 digest implementations do. I tried manually unrolling the loops in the compression function, but I got slightly worse performance when I did that.

Outside of the #[test]s, I also tested the implementation by generating 1,000 inputs of up to 10MB in size and checking the MD5 digest calculated by this code against the MD5 digest calculated by Java's implementation.

On my computer, I'm getting the following performance:

test md5::bench::md5_10 ... bench: 52 ns/iter (+/- 1) = 192 MB/s
test md5::bench::md5_1k ... bench: 2819 ns/iter (+/- 44) = 363 MB/s
test md5::bench::md5_64k ... bench: 178566 ns/iter (+/- 4927) = 367 MB/s

bluss · 2013-08-03T23:00:17Z

We need to discuss the I/O traits (Reader and Writer) at some point. Maybe not now, I'm just making you aware. The SipHash in libstd does implement Writer, and I think it would make sense for all digest objects to do too, eventually.

Right now — since the move to the new I/O traits is not completed — it's not really a convenient time. I was thinking of this because the I/O module already defines reading/writing things like u64_le and so on.

alco · 2013-08-05T17:50:46Z

I'm having issues with building your branch and I don't have much free time to deal with this anymore. My MD5 implementation is based on MD4 that is currently in Rust, so it should be pretty straightforward to compare that to this PR if anyone has interest in this.

DaGenix · 2013-08-07T01:16:51Z

@blake2-ppc I'm not sure I have a fully baked opinion on implementing Writer. One thing that feels weird to me is that the Writer trait has a flush() method. This method is unnecessary for Digests, but if they all implement Writer, then they'd all have to have no-op versions of this method. Writer also defines an error handling protocol. That protocol isn't really applicable to Digests, however, since they generally can't fail in a way where I think a reasonable application would be able to recover. Md5 can't fail at all; Sha1 and Sha2 can only fail if the input size is too large. I don't see how an application can recover from that error though - if the application needs a Sha1 hash, but the input is too large, I don't think there much that can be done. In general, I think an application that needs a Sha1 hash needs a Sha1 hash specifically and won't be able to detect failure and then fallback to MD5 or Sha-512 - if those hashes were acceptable, it would have used them in the first place.

What about a set of adapter objects that implement Writer and delegate to a Digest? Maybe DigestWriter and DigestStreamWriter? The former would just send the input to Digest, while the latter would send it to the Digest and then also to another Writer for processing?

DaGenix · 2013-08-07T01:19:53Z

I rebased to get rid of uses of "foreach."

emberian · 2013-08-07T22:40:19Z

@DaGenix flush should probably have a no-op default implementation. I don't think your concerns about error handling are relevant. It just wont' throw the error if it can't happen, and if nothing can be done to handle the error for other hashes, that's fine too, but there's no problem using a condition, and using a condition is nicer than task failure when the user does want to handle the error.

DaGenix · 2013-08-08T05:49:08Z

@cmr I wasn't saying that I think anything is necessarily wrong with the Writer trait. All I'm trying to say is that the Writer trait defines a bunch of behavior that is irrelevant for most or all Digests.

flush() is one example.
A 2nd is that Writer defines raising a io_error condition which I don't think makes sense for Digests. A caller that expects failed calls to write() to raise io_error might be surprised.
If one task writes too much data to a Digest, but doesn't get the result, and than passes that Digest to a child task which does get the result - which task should fail, the parent or the child? If we want the child to fail, we'd have to add extra state handling logic to the Digests for this corner case. If its the parent that fails, that might be surprising due to the previous point.
Right now the Reader and Writer traits are a little inconsistent - Reader allows for reads to return 0 bytes, but the Writer trait doesn't currently allow for write() to write 0 bytes. POSIX write() allows for this, however. So, I'd assume that either Reader or Writer will be changed eventually. If Writer grows the ability to complete without writing everything that was requested, that becomes another thing that doesn't really apply to Digests. Any effort that the caller spends to handle this case is wasted if a Digest is passed in. Maybe LLVM could optimize out that effort in some cases, but I'd imagine it couldn't handle all of them.
How often does a caller want to write data to a Digest, but not also have access to the data that was processed? Probably sometimes, but, I'd argue that in the majority of cases, if I compute a Digest on some data, I also want to have the data that was processed to write to a file or a socket. Not every function that sends bytes to a Writer is guaranteed to send the exact same bytes each time.

So, is a Digest really a Writer? The main similarity seems to be that both a Writer and a Digest can accept a bunch of bytes and do something with them, but they differ in how they accept those bytes.

If no-one agrees with me, I'm happy to write up a patch that implements Writer for Digests, though, I have some reservations about it. I'd suggest that maybe a different PR would be a better place to do that, since that change would affect all existing Digests, not just this one.

DaGenix · 2013-08-08T05:50:56Z

I opened this PR to foster a discussion / comparison of various implementations. Is there more discussion that needs to take place? Is this ready for a review or should this stay open a bit longer before being reviewed?

…e and write_u32_le.

The shift_add_check_overflow and shift_add_check_overflow_tuple functions are re-written to be more efficient and to make use of the CheckedAdd instrinsic instead of manually checking for integer overflow. * The invokation leading_zeros() is removed and replaced with simple integer comparison. The leading_zeros() method results in a ctpop LLVM instruction and it may not be efficient on all architectures; integer comparisons, however, are efficient on just about any architecture. * The methods lose the ability for the caller to specify a particular shift value - that functionality wasn't being used and removing it allows for the code to be simplified. * Finally, the methods are renamed to add_bytes_to_bits and add_bytes_to_bits_tuple to reflect their very specific purposes.

DaGenix · 2013-08-17T04:43:15Z

@cmr I rebased this to combine it with #8501 to be easier on bors.

@alco

An MD5 implementation was originally included in #8097, but, since there are a couple different implementations of that digest algorithm (@alco mentioned his implementation on the mailing list just before I opened that PR), it was suggested that I remove it from that PR and open up a new PR to discuss the different implementations and the best way forward. If anyone wants to discuss a different implementation, feel free to present it here and discuss and compare it to this one. I'll just discuss my implementation and I'll leave it to others to present details of theirs. This implementation relies on the FixedBuffer struct from cryptoutil.rs for managing the input buffer, just like the Sha1 and Sha2 digest implementations do. I tried manually unrolling the loops in the compression function, but I got slightly worse performance when I did that. Outside of the #[test]s, I also tested the implementation by generating 1,000 inputs of up to 10MB in size and checking the MD5 digest calculated by this code against the MD5 digest calculated by Java's implementation. On my computer, I'm getting the following performance: ``` test md5::bench::md5_10 ... bench: 52 ns/iter (+/- 1) = 192 MB/s test md5::bench::md5_1k ... bench: 2819 ns/iter (+/- 44) = 363 MB/s test md5::bench::md5_64k ... bench: 178566 ns/iter (+/- 4927) = 367 MB/s ```

Rustup r? `@ghost` changelog: none

DaGenix mentioned this pull request Aug 3, 2013

Implement MD5 in the digest module #8048

Closed

Palmer Cox added 4 commits August 17, 2013 00:22

Crypto: Add little-endian versions of existing functions: read_u32v_l…

a37f284

…e and write_u32_le.

MD5: Create an implementation of MD5.

c707065

Crypto: Add tests for add_bytes_to_bits functions.

b00aa12

DaGenix mentioned this pull request Aug 17, 2013

Crypto: Improve checked add functions & add tests #8501

Closed

bors closed this Aug 17, 2013

flip1995 pushed a commit to flip1995/rust that referenced this pull request Jan 13, 2022

Auto merge of rust-lang#8272 - flip1995:rustup, r=flip1995

97a5daa

Rustup r? `@ghost` changelog: none

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MD5 Implementation #8272

MD5 Implementation #8272

DaGenix commented Aug 3, 2013

bluss commented Aug 3, 2013

alco commented Aug 5, 2013

DaGenix commented Aug 7, 2013

DaGenix commented Aug 7, 2013

emberian commented Aug 7, 2013

DaGenix commented Aug 8, 2013

DaGenix commented Aug 8, 2013

DaGenix commented Aug 17, 2013

MD5 Implementation #8272

MD5 Implementation #8272

Conversation

DaGenix commented Aug 3, 2013

bluss commented Aug 3, 2013

alco commented Aug 5, 2013

DaGenix commented Aug 7, 2013

DaGenix commented Aug 7, 2013

emberian commented Aug 7, 2013

DaGenix commented Aug 8, 2013

DaGenix commented Aug 8, 2013

DaGenix commented Aug 17, 2013