-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make short string hashing 30% faster by splitting Hash::hash_end from Hash::hash #29139
Conversation
By distinguishing the end hash operations from middle hash operations, we can avoid hashing unnecessary sentinels. For instance, (String, String) only needs a 0xFF in the middle, not at the end.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @brson (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. The way Github handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
I'm a bit worried this will run into the same problem @gankro found in the first attempt: This is a breaking change for any external impls of I'm absolutely interested in any approach to solve or improve short input hashing and accommodating other hash algorithms. I offer one argument in favour of the approach in my PR: Whether to care about prefixfreeness or not, and how to solve it, should be a property of the Hasher, not the value to be hashed (Hash trait). I also think it has much lower backward compat risk. |
I think that with this and #28044 it may be the point that we should hold off for an RFC to work through the design space here. I'm personally a little unsure about what the constraints are and e.g. where it falls down today. |
@alexcrichton What would the path forward for that be? Shall I reformat my version of the proposal as an RFC and take it there? |
☔ The latest upstream changes (presumably #29254) made this pull request unmergeable. Please resolve the merge conflicts. |
@sorear Always happy to see yet another DCSS developer here. :P |
Seems like an RFC was desired. Closing. |
Why?
Since hash functions are already designed to prevent collisions between a string and its prefixes, it's somewhat inelegant that we append a sentinel byte to strings for hashing. I was looking at #25237 a few days ago and realized that if we distinguish hashing contexts which are at the end of the key from those that aren't, we can suppress the sentinel byte (and also vector lengths) in the cases where they aren't needed; in addition to saving a byte of hashing, it saves a call to update and associated buffer-management overhead.
This attacks the same problem as #28044.
How?
This adds a new method
hash_end
to theHash
trait, which behaves exactly ashash
except that it need not produce a prefix-free encoding. It is always legal forhash_end
to be the same ashash
, and as such this is the default implementation. There are specialized implementations for strings and slices which remove the end/length markers.How much?
Here's a small benchmark script:
I ran it in each mode on the patched and baseline rust compilers (with
-O
, on x86_64 OSX), median of 27 runs each time, for the following timings:Subtracting out the baseline (0) case which just allocates and frees strings, it looks like a 34% improvement on short string hashing, 13% on hashset queries, and 2% on hashset insertions. Uncertainty for the medians seems to be around 10ms.
What's the catch?
Borrow
implementers) can no longer generally do so by forwarding thehash
method;hash_end
must be forwarded as well. This situation exists exactly once in the compiler.#[derive(Hash)]
has been modified to forwardhash_end
, so newtype-ish wrappers will just work (outside of the compiler; the compiler needs to implementhash_end
itself when it's needed forBorrow
, because we can't rely on the stage0 to do it.)Wait!
hash_end
should probably be feature gated. It's not in this version of the patch, because when I tried feature gating it deriving broke; I'm not sure how to tell rustc to ignore feature gates in deriving-generated code.I'm not sure whether this belongs as an RFC.