-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MD5 Implementation #8272
MD5 Implementation #8272
Conversation
We need to discuss the I/O traits (Reader and Writer) at some point. Maybe not now, I'm just making you aware. The SipHash in libstd does implement Writer, and I think it would make sense for all digest objects to do too, eventually. Right now — since the move to the new I/O traits is not completed — it's not really a convenient time. I was thinking of this because the I/O module already defines reading/writing things like |
I'm having issues with building your branch and I don't have much free time to deal with this anymore. My MD5 implementation is based on MD4 that is currently in Rust, so it should be pretty straightforward to compare that to this PR if anyone has interest in this. |
@blake2-ppc I'm not sure I have a fully baked opinion on implementing Writer. One thing that feels weird to me is that the Writer trait has a flush() method. This method is unnecessary for Digests, but if they all implement Writer, then they'd all have to have no-op versions of this method. Writer also defines an error handling protocol. That protocol isn't really applicable to Digests, however, since they generally can't fail in a way where I think a reasonable application would be able to recover. Md5 can't fail at all; Sha1 and Sha2 can only fail if the input size is too large. I don't see how an application can recover from that error though - if the application needs a Sha1 hash, but the input is too large, I don't think there much that can be done. In general, I think an application that needs a Sha1 hash needs a Sha1 hash specifically and won't be able to detect failure and then fallback to MD5 or Sha-512 - if those hashes were acceptable, it would have used them in the first place. What about a set of adapter objects that implement Writer and delegate to a Digest? Maybe DigestWriter and DigestStreamWriter? The former would just send the input to Digest, while the latter would send it to the Digest and then also to another Writer for processing? |
I rebased to get rid of uses of "foreach." |
@DaGenix |
@cmr I wasn't saying that I think anything is necessarily wrong with the Writer trait. All I'm trying to say is that the Writer trait defines a bunch of behavior that is irrelevant for most or all Digests.
So, is a Digest really a Writer? The main similarity seems to be that both a Writer and a Digest can accept a bunch of bytes and do something with them, but they differ in how they accept those bytes. If no-one agrees with me, I'm happy to write up a patch that implements Writer for Digests, though, I have some reservations about it. I'd suggest that maybe a different PR would be a better place to do that, since that change would affect all existing Digests, not just this one. |
I opened this PR to foster a discussion / comparison of various implementations. Is there more discussion that needs to take place? Is this ready for a review or should this stay open a bit longer before being reviewed? |
…e and write_u32_le.
The shift_add_check_overflow and shift_add_check_overflow_tuple functions are re-written to be more efficient and to make use of the CheckedAdd instrinsic instead of manually checking for integer overflow. * The invokation leading_zeros() is removed and replaced with simple integer comparison. The leading_zeros() method results in a ctpop LLVM instruction and it may not be efficient on all architectures; integer comparisons, however, are efficient on just about any architecture. * The methods lose the ability for the caller to specify a particular shift value - that functionality wasn't being used and removing it allows for the code to be simplified. * Finally, the methods are renamed to add_bytes_to_bits and add_bytes_to_bits_tuple to reflect their very specific purposes.
An MD5 implementation was originally included in #8097, but, since there are a couple different implementations of that digest algorithm (@alco mentioned his implementation on the mailing list just before I opened that PR), it was suggested that I remove it from that PR and open up a new PR to discuss the different implementations and the best way forward. If anyone wants to discuss a different implementation, feel free to present it here and discuss and compare it to this one. I'll just discuss my implementation and I'll leave it to others to present details of theirs. This implementation relies on the FixedBuffer struct from cryptoutil.rs for managing the input buffer, just like the Sha1 and Sha2 digest implementations do. I tried manually unrolling the loops in the compression function, but I got slightly worse performance when I did that. Outside of the #[test]s, I also tested the implementation by generating 1,000 inputs of up to 10MB in size and checking the MD5 digest calculated by this code against the MD5 digest calculated by Java's implementation. On my computer, I'm getting the following performance: ``` test md5::bench::md5_10 ... bench: 52 ns/iter (+/- 1) = 192 MB/s test md5::bench::md5_1k ... bench: 2819 ns/iter (+/- 44) = 363 MB/s test md5::bench::md5_64k ... bench: 178566 ns/iter (+/- 4927) = 367 MB/s ```
Rustup r? `@ghost` changelog: none
An MD5 implementation was originally included in #8097, but, since there are a couple different implementations of that digest algorithm (@alco mentioned his implementation on the mailing list just before I opened that PR), it was suggested that I remove it from that PR and open up a new PR to discuss the different implementations and the best way forward. If anyone wants to discuss a different implementation, feel free to present it here and discuss and compare it to this one. I'll just discuss my implementation and I'll leave it to others to present details of theirs.
This implementation relies on the FixedBuffer struct from cryptoutil.rs for managing the input buffer, just like the Sha1 and Sha2 digest implementations do. I tried manually unrolling the loops in the compression function, but I got slightly worse performance when I did that.
Outside of the #[test]s, I also tested the implementation by generating 1,000 inputs of up to 10MB in size and checking the MD5 digest calculated by this code against the MD5 digest calculated by Java's implementation.
On my computer, I'm getting the following performance: