-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix isize
optimization in StableHasher
for big-endian architectures
#93615
Conversation
// | ||
// To ensure that this optimization hashes the exact same bytes on both little-endian and | ||
// big-endian architectures, we compare the value with 0xFF before we convert the number | ||
// into a unified representation (little-endian). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comment is correct but I think we could put more emphasis that the endianness conversion must be the last step because that creates platform-dependent values to get platform-independent bytes.
It would be clearer if siphasher::write
were generic over [u8; N]
instead of taking different primitives. Oh well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well to be fair it contains an optimized implementation for these primitives, so it's probably worth it.
Should I add something like
First, we have to compare the value (which has to be done in a platform-dependent manner) and only then can we convert the number to the little-endian format (to ensure platform-independent bytes being hashed).
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(which has to be done in a platform-dependent manner)
That's probably confusing. We're going from [platform-dependent byte-representation, platform-independent value] to [platform-independent byte-representation, platform-dependent value]. Which means all operations that depend on the value must happen before that and afterwards we could only do bit-twiddling operations.
It would be more obvious if we used to_le_bytes
.
I don't mean to explain endianness, it's just about which things must be be done before and after the conversion. That's what I didn't consider during the review. 😓
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find .to_le()
and .to_be()
to be really confusing and always use to_le_bytes()
and to_be_bytes()
instead, which makes it much less likely to get things accidentally wrong (by converting twice for example).
Now that we have const generics it would probably be easy to just change SipHasher128::short_write()
to SipHasher128::short_write<const LEN: usize>(&mut self, bytes: &[u8; LEN])
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good plan. Since this is a portability bug let's fix it first and then improve the design.
@bors r+ rollup |
📌 Commit c21b8e1 has been approved by |
…askrgr Rollup of 7 pull requests Successful merges: - rust-lang#90132 (Stabilize `-Z instrument-coverage` as `-C instrument-coverage`) - rust-lang#91589 (impl `Arc::unwrap_or_clone`) - rust-lang#93495 (kmc-solid: Fix off-by-one error in `SystemTime::now`) - rust-lang#93576 (Emit more valid HTML from rustdoc) - rust-lang#93608 (Clean up `find_library_crate`) - rust-lang#93612 (doc: use U+2212 for minus sign in integer MIN/MAX text) - rust-lang#93615 (Fix `isize` optimization in `StableHasher` for big-endian architectures) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
Use const generics in SipHasher128's short_write This was proposed by `@michaelwoerister` [here](rust-lang#93615 (comment)). A few comments: 1) I tried to pass `&[u8; LEN]` instead of `[u8; LEN]`. Locally, it resulted in small icount regressions (about 0.5 %). When passing by value, there were no regressions (and no improvements). 2) I wonder if we should use `to_ne_bytes()` in `SipHasher128` to keep it generic and only use `to_le_bytes()` in `StableHasher`. However, currently `SipHasher128` is only used in `StableHasher` and the `short_write` method was private, so I couldn't use it directly from `StableHasher`. Using `to_le()` in the `StableHasher` was breaking this abstraction boundary before slightly. ```rust debug_assert!(LEN <= 8); ``` This could be done at compile time, but actually I think that now we can remove this assert altogether. r? `@the8472`
This PR fixes a problem with the stable hash optimization introduced in #93432. As @michaelwoerister has found out, the original implementation wouldn't produce the same hash on little/big architectures.
r? @the8472