Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hashmap: use siphash-1-3 as default hasher #33940

Merged
merged 1 commit into from
Jul 2, 2016

Conversation

seanmonstar
Copy link
Contributor

Also exposes SipHash13 and SipHash24 in core::hash::sip, for those that want to differentiate.

For motivation, see this quote from the original issue:

we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of SipHasher to SipHash24, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754

@rust-highfive
Copy link
Collaborator

r? @alexcrichton

(rust_highfive has picked a reviewer for you, use r? to override)

#[stable(feature = "rust1", since = "1.0.0")]
pub struct SipHasher {
pub type SipHasher = SipHash24;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably make this an opaque tuple struct to allow us flexibility in the future to change what it corresponds to.

@seanmonstar
Copy link
Contributor Author

@sfackler good idea, done.

@bluss
Copy link
Member

bluss commented May 30, 2016

Exposing SipHash (2-4) publicly was not needed, and we could have avoided it. I think we should not make SipHash13 public, to avoid repeating the mistake.

Rust is after all very reluctant to ship anything crypto-related.

SipHash13 should have tests that check it against the reference implementation. Here are the answers from the test vectors #29754 (comment)

@alexcrichton
Copy link
Member

Thanks for the PR! Do you have any benchmarks as well showing changes either here or there?

@alexcrichton alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label May 31, 2016
@seanmonstar
Copy link
Contributor Author

@alexcrichton In the issue, there is this image of benchmarks, though it is from November 2015.

@alexcrichton
Copy link
Member

Excellent! I doubt the numbers have changed much in the meantime, so sounds good to me.

@alexcrichton
Copy link
Member

Leaving this tagged as T-libs for discussion during triage. I may not be present for the next one, so some things to think about:

  • Do we want to expose this as a public API? Was exposing SipHasher a mistake? I personally feel like we've been too minimal and wouldn't mind expanding our hash algorithms a bit.
  • Do we want to merge this now? Perhaps we should publish a siphash13 crate on crates.io and see how it fares for a bit with hash maps? Crates can already switch the hash algorithm on stable Rust, so experimentation is possible.
  • What do we want to do with SipHasher? Leave it as-is, deprecate it, or deprecate/rename it to SipHash24? My preference would be to introduce SipHash13 and SipHash24, deprecating SipHasher with a rename, although this would of course be influenced by the first question.

@brson
Copy link
Contributor

brson commented May 31, 2016

I don't think it's worth deprecating SipHasher just because we named it wrong. Are these hashers only exposed in core or also in std?

I do think we should just change the default hasher to siphash 1-3.

@alexcrichton
Copy link
Member

The core::hash module I believe is wholesale reexported in std, so the contents should be the same. I also agree with @brson that we should switch the default to siphash 1-3, the only question in my mind is what to do about the APIs.

If we don't deprecate SipHasher, I'd vote for std::hash::SipHasher13 (negotiable on the "er"), I wouldn't want the extra sip module indirection. If we do deprecate it then I'd vote for std::hash::{SipHash13,SipHash24}. (still negotiable on "er")

@arthurprs
Copy link
Contributor

This is a welcome change.

I don't think we should expose the faster variant. In fact I think we should deprecate the existing hasher as well and don't expose any specific implementation.

@arthurprs
Copy link
Contributor

arthurprs commented Jun 2, 2016

Here are updated benchmarks (I got a new computer)

Results http://i.imgur.com/QA6Ey9B.png

Code https://github.com/arthurprs/hash-rs

Sip1-3 is very competitive at the lower end and good enough in the high end.

@seanmonstar
Copy link
Contributor Author

Since the functionality is desirable and provides an immediate performance improvement for HashMaps, would it be a good idea to merge the change of the default hasher, and leave naming/exposing variants/etc to the stabilization issue?

@alexcrichton
Copy link
Member

Unfortunately the libs team hasn't had a chance to talk about this just yet, but I suspect we'll all be on board with the current strategy basically.

@sfackler
Copy link
Member

👍

@sorear
Copy link
Contributor

sorear commented Jun 19, 2016

Would there be any interest in additionally using specialization to get rid of the trailing 0xFF sentinel on str slices? The sentinel is needed if you're hashing anything else after the string, but it's not needed for hashing a string by itself, and my tests last year indicated that removing the sentinel was a rather large performance gain.

I did a PR last year which optimized that, but it changed the public API of the Hash trait and was (rightly) rejected; now that we have specialization it seems possibly worthy to revisit, because it might be doable without a public API change now.

The biggest problem with short string SipHash performance isn't the number of rounds (although reducing that can't hurt), it's the u8 to u64 conversion step…

@alexcrichton alexcrichton added relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-crater Status: Waiting on a crater run to be completed. labels Jun 21, 2016
@alexcrichton
Copy link
Member

The libs team got a chance to discuss this PR today and the conclusions were:

  • Let's not have a public sip module in std::hash, instead reexport types under std::hash directly.
  • Let's not deprecate SipHasher yet (as this PR is doing), and consider later once we stabilize these new APIs
  • We want to consider renaming SipHash24 to SipHasher24 as that's more conventional currently (hasher implementations have "hasher", not "hash")

Unfortunately we realized that this is a breaking change as we didn't quite protect ourselves from changing the default hasher 100%. Today, this code compiles:

let a: SipHasher = RandomState::new().build_hasher();

That is, the implementation of BuildHasher has SipHasher as the stable output type. This should be something like std::collections::hash_map::DefaultHasher which we can swap the algorithm internally with. To me it seems that it's very unlikely for anything to break in practice when changing this, but we should do a crater run at least. There were other ideas about how to perhaps make this change and retain backwards compatibility, but they're somewhat ... outlandish.

Finally, when stabilizing these APIs, we'll have a few questions that arise:

  • Do we really want to stabilize more hash algorithms in std::hash? Maybe we can get away with an empty std::hash and a deprecated SipHasher in the long run. We're pretty likely to change the hash algorithm of hash maps over time so we don't necessarily want to have std::hash collect a bunch of historical old "broken" algorithms, but on the other hand they have very low maintenance costs, so it's perhaps not so bad.
  • What to do with SipHasher when other types are stabilized.

In any case though all the API additions here are unstable, so these are questions we can punt to stabilization for now. In the meantime we'd like to merge this to get the benefits of using SipHash 1-3 instead of 2-4.

@seanmonstar
Copy link
Contributor Author

For backwards compatibility, there's 2 options:

  • impl BuilderHasher for RandomState { type Hasher = SipHash13; }
    This breaks the example you listed, let a: SipHash = RandomState::new().build_hasher(), but keeps SipHasher internally equivalent.
  • pub struct SipHasher(SipHasher13)
    This allows the example to still compile, but means that any tests using SipHasher with specific keys may start breaking, as the rounds are less and the result hash differs.

@alexcrichton
Copy link
Member

Ideally we would have:

impl BuildHasher for RandomState {
    type Hasher = DefaultHasher;
    // ...
}

pub struct DefaultHasher(SipHasher);

// ...

We could then change the internals of DefaultHasher at any time.

Could you implement this and then we can at least see if there's any fallout in crater?

@seanmonstar
Copy link
Contributor Author

I've added an unstable DefaultHasher to the map module, as you suggested. (make check is still slowly building on my machine).

@seanmonstar seanmonstar force-pushed the siphash-1-3 branch 2 times, most recently from 78e2dd5 to 9f6b4bd Compare June 21, 2016 21:59
@alexcrichton
Copy link
Member

The travis error looks legit

@bors
Copy link
Contributor

bors commented Jul 1, 2016

💔 Test failed - auto-win-msvc-64-opt-rustbuild

@alexcrichton
Copy link
Member

@bors: retry

On Fri, Jul 1, 2016 at 12:21 PM, bors notifications@github.com wrote:

💔 Test failed - auto-win-msvc-64-opt-rustbuild
https://buildbot.rust-lang.org/builders/auto-win-msvc-64-opt-rustbuild/builds/1642


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#33940 (comment), or mute
the thread
https://github.com/notifications/unsubscribe/AAD95HutNH8189Moy-Fd0tqP3mMW_SQ0ks5qRWjXgaJpZM4IpOE5
.

@bors
Copy link
Contributor

bors commented Jul 1, 2016

⌛ Testing commit db1b191 with merge 0141193...

bors added a commit that referenced this pull request Jul 1, 2016
hashmap: use siphash-1-3 as default hasher

Also exposes `SipHash13` and `SipHash24` in `core::hash::sip`, for those that want to differentiate.

For motivation, see this quote from the original issue:

> we proposed SipHash-2-4 as a (strong) PRF/MAC, and so far no attack whatsoever has been found,
although many competent people tried to break it. However, fewer rounds may be sufficient and I would
be very surprised if SipHash-1-3 introduced weaknesses for hash tables.

This keeps a type alias of `SipHasher` to `SipHash24`, and since the internal default hasher of HashMap is specified as "not specified", changing it should not be a breaking change.

Closes #29754
@bors bors merged commit db1b191 into rust-lang:master Jul 2, 2016
@seanmonstar
Copy link
Contributor Author

yay!

@sorear
Copy link
Contributor

sorear commented Jul 2, 2016

\o/

Next question: would there be interest in another attempt to address #29754 (comment) , possibly via specialization?

@funny-falcon
Copy link

@sorear if it is possible to split hasher to "advance" and "finalization", then it could be possible to keep whole 256bit SipHash state, feed it either with 64bit integer at once, or with string. Then run finalization once.

Strings should be feeded as usual siphash inner code without finalization (but including xoring with 0xff).

Integers are mixed in as one Siphash iteration + xoring with 0xff as prevention of string extention.

@funny-falcon
Copy link

Probably, my suggestion is over complexified.

Perhaps, using just incremental Siphash implementation is just enough.

@sorear
Copy link
Contributor

sorear commented Jul 2, 2016

@funny-falcon We already are using an incremental implementation. The problem is substantially more complicated than that. See the linked comment, and my #29139.

@pczarn
Copy link
Contributor

pczarn commented Jul 3, 2016

@sorear Yes, let's use specialization. I'm currently writing RFCs. The implementation is already here: contain-rs/hashmap2#5

funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 9, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 10, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 10, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 10, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 12, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Jul 12, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 5, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 26, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 26, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Sep 27, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Nov 2, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Nov 4, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Nov 5, 2016
…hash24

SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
funny-falcon added a commit to funny-falcon/ruby that referenced this pull request Dec 8, 2016
SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940
hsbt pushed a commit to ruby/ruby that referenced this pull request Jan 20, 2017
SipHash13 is secure enough to be used in hash-tables,
and SipHash's author confirms that.
Rust already considered switch to SipHash13:
  rust-lang/rust#29754 (comment)
Jean-Philippe Aumasson confirmation:
  rust-lang/rust#29754 (comment)
Merged pull request:
  rust-lang/rust#33940

From: Sokolov Yura aka funny_falcon <funny.falcon@gmail.com>
Date: Thu, 8 Dec 2016 20:31:29 +0300
Signed-off-by: Urabe, Shyouhei <shyouhei@ruby-lang.org>
Fixes: [Feature #13017]


git-svn-id: svn+ssh://ci.ruby-lang.org/ruby/trunk@57382 b2dd03c8-39d4-4d8f-98ff-823fe69b080e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. S-waiting-on-crater Status: Waiting on a crater run to be completed. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.