Add test for stable hash uniqueness of adjacent field values #93193

Kobzol · 2022-01-22T10:00:08Z

This PR adds a simple test to check that stable hash will produce a different hash if the order of two values that have the same combined bit pattern changes.

r? @the8472

the8472 · 2022-01-27T18:34:50Z

@bors r+ rollup

bors · 2022-01-27T18:34:52Z

📌 Commit 1ffd043 has been approved by the8472

…askrgr Rollup of 8 pull requests Successful merges: - rust-lang#91641 (Define c_char using cfg_if rather than repeating 40-line cfg) - rust-lang#92899 (Mention std::iter::zip in Iterator::zip docs) - rust-lang#93193 (Add test for stable hash uniqueness of adjacent field values) - rust-lang#93325 (Introduce a limit to Levenshtein distance computation) - rust-lang#93339 (rustdoc: add test case for multiple traits and erased names) - rust-lang#93357 (Clarify the `usage-of-qualified-ty` error message.) - rust-lang#93363 (`#[rustc_pass_by_value]` cleanup) - rust-lang#93365 (More arena cleanups) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup

…sion, r=the8472 Compress amount of hashed bytes for `isize` values in StableHasher This is another attempt to land rust-lang#92103, this time hopefully with a correct implementation w.r.t. stable hashing guarantees. The previous PR was [reverted](rust-lang#93014) because it could produce the [same hash](rust-lang#92103 (comment)) for different values even in quite simple situations. I have since added a basic [test](rust-lang#93193) that should guard against that situation, I also added a new test in this PR, specialised for this optimization. ## Why this optimization helps Since the original PR, I have tried to analyze why this optimization even helps (and why it especially helps for `clap`). I found that the vast majority of stable-hashing `i64` actually comes from hashing `isize` (which is converted to `i64` in the stable hasher). I only found a single place where is this datatype used directly in the compiler, and this place has also been showing up in traces that I used to find out when is `isize` being hashed. This place is `rustc_span::FileName::DocTest`, however, I suppose that isizes also come from other places, but they might not be so easy to find (there were some other entries in the trace). `clap` hashes about 8.5 million `isize`s, and all of them fit into a single byte, which is why this optimization has helped it [quite a lot](rust-lang#92103 (comment)). Now, I'm not sure if special casing `isize` is the correct solution here, maybe something could be done with that `isize` inside `DocTest` or in other places, but that's for another discussion I suppose. In this PR, instead of hardcoding a special case inside `SipHasher128`, I instead put it into `StableHasher`, and only used it for `isize` (I tested that for `i64` it doesn't help, or at least not for `clap` and other few benchmarks that I was testing). ## New approach Since the most common case is a single byte, I added a fast path for hashing `isize` values which positive value fits within a single byte, and a cold path for the rest of the values. To avoid the previous correctness problem, we need to make sure that each unique `isize` value will produce a unique hash stream to the hasher. By hash stream I mean a sequence of bytes that will be hashed (a different sequence should produce a different hash, but that is of course not guaranteed). We have to distinguish different values that produce the same bit pattern when we combine them. For example, if we just simply skipped the leading zero bytes for values that fit within a single byte, `(0xFF, 0xFFFFFFFFFFFFFFFF)` and `(0xFFFFFFFFFFFFFFFF, 0xFF)` would send the same hash stream to the hasher, which must not happen. To avoid this situation, values `[0, 0xFE]` are hashed as a single byte. When we hash a larger (treating `isize` as `u64`) value, we first hash an additional byte `0xFF`. Since `0xFF` cannot occur when we apply the single byte optimization, we guarantee that the hash streams will be unique when hashing two values `(a, b)` and `(b, a)` if `a != b`: 1) When both `a` and `b` are within `[0, 0xFE]`, their hash streams will be different. 2) When neither `a` and `b` are within `[0, 0xFE]`, their hash streams will be different. 3) When `a` is within `[0, 0xFE]` and `b` isn't, when we hash `(a, b)`, the hash stream will definitely not begin with `0xFF`. When we hash `(b, a)`, the hash stream will definitely begin with `0xFF`. Therefore the hash streams will be different. r? `@the8472`

rust-highfive assigned the8472 Jan 22, 2022

rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Jan 22, 2022

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 22, 2022

Kobzol changed the title ~~Add test stable hash uniqueness of adjacent field values~~ Add test for stable hash uniqueness of adjacent field values Jan 22, 2022

This comment has been minimized.

Sign in to view

Add test stable hash uniqueness of adjacent field values

1ffd043

Kobzol force-pushed the stable-hash-permutation-test branch from 90449b9 to 1ffd043 Compare January 24, 2022 14:47

Kobzol marked this pull request as ready for review January 24, 2022 14:51

Kobzol mentioned this pull request Jan 26, 2022

Do not hash leading zero bytes of i64 numbers in Sip128 hasher #92103

Merged

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 27, 2022

matthiaskrgr mentioned this pull request Jan 27, 2022

Rollup of 8 pull requests #93390

Merged

bors merged commit bc26f97 into rust-lang:master Jan 28, 2022

rustbot added this to the 1.60.0 milestone Jan 28, 2022

Kobzol deleted the stable-hash-permutation-test branch January 28, 2022 15:05

Kobzol mentioned this pull request Jan 28, 2022

Compress amount of hashed bytes for isize values in StableHasher #93432

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add test for stable hash uniqueness of adjacent field values #93193

Add test for stable hash uniqueness of adjacent field values #93193

Kobzol commented Jan 22, 2022 •

edited

Loading

This comment has been minimized.

the8472 commented Jan 27, 2022

bors commented Jan 27, 2022

Add test for stable hash uniqueness of adjacent field values #93193

Add test for stable hash uniqueness of adjacent field values #93193

Conversation

Kobzol commented Jan 22, 2022 • edited Loading

This comment has been minimized.

the8472 commented Jan 27, 2022

bors commented Jan 27, 2022

Kobzol commented Jan 22, 2022 •

edited

Loading