-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hashmap: use siphash-1-3 as default hasher #33940
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
#[stable(feature = "rust1", since = "1.0.0")] | ||
pub struct SipHasher { | ||
pub type SipHasher = SipHash24; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably make this an opaque tuple struct to allow us flexibility in the future to change what it corresponds to.
@sfackler good idea, done. |
Exposing SipHash (2-4) publicly was not needed, and we could have avoided it. I think we should not make SipHash13 public, to avoid repeating the mistake. Rust is after all very reluctant to ship anything crypto-related. SipHash13 should have tests that check it against the reference implementation. Here are the answers from the test vectors #29754 (comment) |
Thanks for the PR! Do you have any benchmarks as well showing changes either here or there? |
@alexcrichton In the issue, there is this image of benchmarks, though it is from November 2015. |
Excellent! I doubt the numbers have changed much in the meantime, so sounds good to me. |
Leaving this tagged as T-libs for discussion during triage. I may not be present for the next one, so some things to think about:
|
I don't think it's worth deprecating I do think we should just change the default hasher to siphash 1-3. |
The If we don't deprecate |
This is a welcome change. I don't think we should expose the faster variant. In fact I think we should deprecate the existing hasher as well and don't expose any specific implementation. |
Here are updated benchmarks (I got a new computer) Results http://i.imgur.com/QA6Ey9B.png Code https://github.com/arthurprs/hash-rs Sip1-3 is very competitive at the lower end and good enough in the high end. |
Since the functionality is desirable and provides an immediate performance improvement for HashMaps, would it be a good idea to merge the change of the default hasher, and leave naming/exposing variants/etc to the stabilization issue? |
Unfortunately the libs team hasn't had a chance to talk about this just yet, but I suspect we'll all be on board with the current strategy basically. |
👍 |
Would there be any interest in additionally using specialization to get rid of the trailing 0xFF sentinel on str slices? The sentinel is needed if you're hashing anything else after the string, but it's not needed for hashing a string by itself, and my tests last year indicated that removing the sentinel was a rather large performance gain. I did a PR last year which optimized that, but it changed the public API of the Hash trait and was (rightly) rejected; now that we have specialization it seems possibly worthy to revisit, because it might be doable without a public API change now. The biggest problem with short string SipHash performance isn't the number of rounds (although reducing that can't hurt), it's the u8 to u64 conversion step… |
The libs team got a chance to discuss this PR today and the conclusions were:
Unfortunately we realized that this is a breaking change as we didn't quite protect ourselves from changing the default hasher 100%. Today, this code compiles: let a: SipHasher = RandomState::new().build_hasher(); That is, the implementation of Finally, when stabilizing these APIs, we'll have a few questions that arise:
In any case though all the API additions here are unstable, so these are questions we can punt to stabilization for now. In the meantime we'd like to merge this to get the benefits of using SipHash 1-3 instead of 2-4. |
For backwards compatibility, there's 2 options:
|
Ideally we would have: impl BuildHasher for RandomState {
type Hasher = DefaultHasher;
// ...
}
pub struct DefaultHasher(SipHasher);
// ... We could then change the internals of Could you implement this and then we can at least see if there's any fallout in crater? |
I've added an unstable |
78e2dd5
to
9f6b4bd
Compare
The travis error looks legit |
💔 Test failed - auto-win-msvc-64-opt-rustbuild |
@bors: retry On Fri, Jul 1, 2016 at 12:21 PM, bors notifications@github.com wrote:
|
hashmap: use siphash-1-3 as default hasher Also exposes `SipHash13` and `SipHash24` in `core::hash::sip`, for those that want to differentiate. For motivation, see this quote from the original issue: > we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found, although many competent people tried to break it. However, fewer rounds may be sufficient and I would be very surprised if SipHash-1-3 introduced weaknesses for hash tables. This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change. Closes #29754
yay! |
\o/ Next question: would there be interest in another attempt to address #29754 (comment) , possibly via specialization? |
@sorear if it is possible to split hasher to "advance" and "finalization", then it could be possible to keep whole 256bit SipHash state, feed it either with 64bit integer at once, or with string. Then run finalization once. Strings should be feeded as usual siphash inner code without finalization (but including xoring with 0xff). Integers are mixed in as one Siphash iteration + xoring with 0xff as prevention of string extention. |
Probably, my suggestion is over complexified. Perhaps, using just incremental Siphash implementation is just enough. |
@funny-falcon We already are using an incremental implementation. The problem is substantially more complicated than that. See the linked comment, and my #29139. |
@sorear Yes, let's use specialization. I'm currently writing RFCs. The implementation is already here: contain-rs/hashmap2#5 |
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
…hash24 SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940
SipHash13 is secure enough to be used in hash-tables, and SipHash's author confirms that. Rust already considered switch to SipHash13: rust-lang/rust#29754 (comment) Jean-Philippe Aumasson confirmation: rust-lang/rust#29754 (comment) Merged pull request: rust-lang/rust#33940 From: Sokolov Yura aka funny_falcon <funny.falcon@gmail.com> Date: Thu, 8 Dec 2016 20:31:29 +0300 Signed-off-by: Urabe, Shyouhei <shyouhei@ruby-lang.org> Fixes: [Feature #13017] git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Also exposes
SipHash13
andSipHash24
incore::hash::sip
, for those that want to differentiate.For motivation, see this quote from the original issue:
This keeps a type alias of
SipHasher
toSipHash24
, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.Closes #29754