Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace FNV with a faster hash function. #37229

Merged
merged 2 commits into from
Nov 9, 2016
Merged

Conversation

nnethercote
Copy link
Contributor

Hash table lookups are very hot in rustc profiles and the time taken within FnvHash itself is a big part of that. Although FNV is a simple hash, it processes its input one byte at a time. In contrast, Firefox has a homespun hash function that is also simple but works on multiple bytes at a time. So I tried it out and the results are compelling:

futures-rs-test  4.326s vs  4.212s --> 1.027x faster (variance: 1.001x, 1.007x)
helloworld       0.233s vs  0.232s --> 1.004x faster (variance: 1.037x, 1.016x)
html5ever-2016-  5.397s vs  5.210s --> 1.036x faster (variance: 1.009x, 1.006x)
hyper.0.5.0      5.018s vs  4.905s --> 1.023x faster (variance: 1.007x, 1.006x)
inflate-0.1.0    4.889s vs  4.872s --> 1.004x faster (variance: 1.012x, 1.007x)
issue-32062-equ  0.347s vs  0.335s --> 1.035x faster (variance: 1.033x, 1.019x)
issue-32278-big  1.717s vs  1.622s --> 1.059x faster (variance: 1.027x, 1.028x)
jld-day15-parse  1.537s vs  1.459s --> 1.054x faster (variance: 1.005x, 1.003x)
piston-image-0. 11.863s vs 11.482s --> 1.033x faster (variance: 1.060x, 1.002x)
regex.0.1.30     2.517s vs  2.453s --> 1.026x faster (variance: 1.011x, 1.013x)
rust-encoding-0  2.080s vs  2.047s --> 1.016x faster (variance: 1.005x, 1.005x)
syntex-0.42.2   32.268s vs 31.275s --> 1.032x faster (variance: 1.014x, 1.022x)
syntex-0.42.2-i 17.629s vs 16.559s --> 1.065x faster (variance: 1.013x, 1.021x)

(That's a stage1 compiler doing debug builds. Results for a stage2 compiler are similar.)

The attached commit is not in a state suitable for landing because I changed the implementation of FnvHasher without changing its name (because that would have required touching many lines in the compiler). Nonetheless, it is a good place to start discussions.

Profiles show very clearly that this new hash function is a lot faster to compute than FNV. The quality of the new hash function is less clear -- it seems to do better in some cases and worse in others (judging by the number of instructions executed in Hash{Map,Set}::get).

CC @brson, @arthurprs

@rust-highfive
Copy link
Collaborator

r? @Aatch

(rust_highfive has picked a reviewer for you, use r? to override)

@arthurprs
Copy link
Contributor

Do we have any backing data for this algorithm? Maybe from the Firefox development process/source? Smhasher run?

@nnethercote
Copy link
Contributor Author

I forgot to mention that there is something of an explanation about this hash function in the Firefox source: https://dxr.mozilla.org/mozilla-central/source/mfbt/HashFunctions.h#74-117.

I modified it from 32-bits to 64-bits by changing the multiplication factor from 0x9E3779B9 (the golden ratio in fixed point) to 0x517cc1b727220a95 (pi in fixed point). I changed it from the golden ratio to pi because the golden ratio in 64-bit fixed point is even -- see http://stackoverflow.com/questions/5889238/why-is-xor-the-default-way-to-combine-hashes#comment54810251_27952689

This hash function was introduced into Firefox in https://bugzilla.mozilla.org/show_bug.cgi?id=729940. There's very little discussion in that bug report about how it was derived.

I'm happy to try Smhasher on it. But the ultimate workload for the hash function used within rustc is rustc itself, and it's clearly working well there.

@arthurprs
Copy link
Contributor

arthurprs commented Oct 17, 2016

I think it's worth discussing a couple more things while we're at it, so we nail this for good.

  • do calculations considering an usize sized hash, this will help a lot in 32bit systems. It's fine to just expand it to u64 on finish. Std HashMap hashes will eventually become usize Use usize instead of u64 for hashes in HashMap #36567 anyway.
  • this works at the same byte at a time for &str/&[int] slices so the improvement is coming exclusively from integral hashing. For those, I'm curious if we can process usize bytes at a time before falling back to the byte at a time.

@nnethercote
Copy link
Contributor Author

I think you miswrote your second dot point... but I did some ad hoc profiling and found that the vast majority of occurrences are write_u32 and write_u64. write was less than 1% of occurrences.

@arthurprs
Copy link
Contributor

arthurprs commented Oct 17, 2016

Did I? I can't find it. I guess I need more more coffee.

Interesting, but I'm almost sure it'll show up when we eventually move everything away from siphasher (string interner for example). Siphasher is still even higher in the profiles.

@bluss
Copy link
Member

bluss commented Oct 17, 2016

Fnv and SipHasher both have the property that the stream of bytes to hash is "untyped": a u16 fed as u16 or its byte representation is hashed the same way.

But I don't think that the Hash trait expects or requires that contract in any way, and that this hash function's "typed" approach is fine.

But what I do think is that a well behaved hasher must hash a slice of bytes the same way, regardless of how you split it into subslices (as long as the order is the same). That means that any whole-word optimization for Hasher::write then needs to keep a state (which is exactly a thing that makes SipHasher a bit slow).

@arthurprs
Copy link
Contributor

But I don't think that the Hash trait expects or requires that contract in any way, and that this hash function's "typed" approach is fine.

Yeah, luckily the Hash trait doesn't impose any special streaming requirement.

@arthurprs
Copy link
Contributor

arthurprs commented Oct 17, 2016

I got curious so I ran smhasher on the 64bit (PR) and original 32bit hashes, I had to include 2 variants of each to be able to see how both modes of the hasher behave (integral and byte-byte...)

see gist for results: https://gist.github.com/arthurprs/5e57cd59586acd8c52dbb02b55711096

A few comments considering the code in the PR.

Hashing integral types (write_...)

The quality is really bad but it's so cheap to calculate for integral types (what rustc seems to be using fnv for) that it's still a win for the combination of the workload + hashmap implementation. I'm fairly sure that the compiler sees the 0 seed and the hash boils down to a single IMUL instruction.

Hashing slices (write_usize() + write())

The write_usize(slice.len()) will be faster and the write() slower compared to fnv. So it could potentially regress those cases.

I think the right way forward is to have two hashers in the rustc codebase, one general purpose-ish and another for integral types. This PR has potential for the later.

@nnethercote
Copy link
Contributor Author

@arthurps: Thank you for running these! I was about to do it myself but you've saved me the trouble.

Looking at the results... whelp, there are a lot of numbers there that I don't know how to interpret, though the "FAIL" results sound bad.

The write_usize(slice.len()) will be faster and the write() slower compared to fnv. So it could potentially regress those cases.

Why will write() be slower? Because FNV does xor + mul, while the new hash does rol + xor + mul? I guess it'll be slightly slower, but the extra rol should be cheap compared to the mul?

@arthurprs
Copy link
Contributor

arthurprs commented Oct 18, 2016

Why will write() be slower? Because FNV does xor + mul, while the new hash does rol + xor + mul? I guess it'll be slightly slower, but the extra rol should be cheap compared to the mul?

It's a 15% difference in my Intel Skylake processor, 690MB/s vs 800MB/s. You can see some rough numbers in the gist.

@nnethercote
Copy link
Contributor Author

But what I do think is that a well behaved hasher must hash a slice of bytes the same way, regardless of how you split it into subslices (as long as the order is the same). That means that any whole-word optimization for Hasher::write then needs to keep a state (which is exactly a thing that makes SipHasher a bit slow).

Are you sure? Where does that requirement come from? I was thinking about changing write so that it processes 4 or 8 bytes at a time and then does single-byte clean-up for any excess bytes at the end...

@nnethercote nnethercote force-pushed the FxHasher branch 2 times, most recently from e8ac705 to 7be7488 Compare October 25, 2016 07:07
@nnethercote
Copy link
Contributor Author

New version. I've made the following changes.

  • FxHasher is now a separate type. FnvHasher still exists.
  • I've converted all uses of FnvHash{Map,Set} to FxHash{Map,Set}. I did some profiling and found that write calls (i.e. variable-length hash cases) account for less than 0.1% of occurrences. Even when I weight them by their length, they account for less than 2% of all FxHasher operations. So I don't think treating variable-length cases differently is worthwhile.
  • FxHasher now works with usize so that it will be faster on 32-bit machines.
  • I remeasured and the speed-ups are basically unchanged from those in the first comment above.

r? @arthurps: what do you think?

@bluss
Copy link
Member

bluss commented Oct 25, 2016

@nnethercote I'm not sure; it's something that needs to be discussed and put into the documentation.

I think it's the logical rule by the construction of Hash. Imagine a chunked rope datastructure. It should have the same hash value, regardless of how it is chunked, as long as its whole string is the same. How it is chunked will determine how its data is fed to Hasher::write.

@bluss
Copy link
Member

bluss commented Oct 25, 2016

To make a concrete example, imagine struct Rope(Vec<String>). The actual string value is the concatenation of the strings in the representation. Rope(["a", "b"]) and Rope(["ab"]) should have the same hash.

@nnethercote
Copy link
Contributor Author

(New version removes the println! statements that I accidentally left in...)

@arthurprs
Copy link
Contributor

arthurprs commented Oct 25, 2016

Looks good to me. Somebody from the core team should weight about how to move this forward.

I wouldn't be worried about the Hasher having the "strict streaming" characteristic as the Hash trait is "strongly typed" and will make the same writes to hasher every time.

@bors
Copy link
Contributor

bors commented Oct 25, 2016

☔ The latest upstream changes (presumably #37292) made this pull request unmergeable. Please resolve the merge conflicts.

@aturon
Copy link
Member

aturon commented Oct 25, 2016

cc @rust-lang/compiler

@bors
Copy link
Contributor

bors commented Oct 26, 2016

☔ The latest upstream changes (presumably #37270) made this pull request unmergeable. Please resolve the merge conflicts.

@nnethercote
Copy link
Contributor Author

With the notable exception of @arthurprs, this is being ignored. It's a big compile speed win, the biggest one I know of, but I fear that concerns about theoretical worst cases will overwhelm the benefit that's been demonstrated widely in practice.

How can we move this forward?

@pnkfelix pnkfelix added I-nominated T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 31, 2016
@pnkfelix
Copy link
Member

(Nominated for discussion amongst compiler team; hopefully that will help it move forward...)

@nikomatsakis
Copy link
Contributor

I think the problem is that @Aatch hasn't been too active of late, so the PR went unnoticed. I have no strong opinion about what hash function we use --- basically, if it's faster, I'm for it. I'm curious if anyone has any objections.

@nnethercote
Copy link
Contributor Author

nnethercote commented Nov 3, 2016

Could you have the Fnv -> Fx global rename in its own commit?

You mean this?

  • First commit adds the new fx.rs file.
  • Second commit changes all the FnvHashMap/Set occurrences to FxHashMap/Set.

Sure. I'll wait until I get full approval from the compiler team, because I have some other conflicts that I need to fix and I might as well do them later to reduce the likelihood of more conflicts afterwards.

@malbarbo
Copy link
Contributor

malbarbo commented Nov 4, 2016

How about defining type alias DefaultHashMap and DefaultHashSet? So in the future the concrete types can be easily changed.

@arthurprs
Copy link
Contributor

Although there's no one size fits all for hashers I think it's easier to opt-out of it if necessary than the other way around. So +1 for the DefaultMap/Set.

@nikomatsakis
Copy link
Contributor

@nnethercote everybody is in favor!

@nnethercote
Copy link
Contributor Author

I rebased and split the PR into two commits: one adding FxHasher, and one converting all FnvHash instances to FxHash instances.

I also remeasured and the results are similar to before.

futures-rs-test  4.020s vs  3.918s --> 1.026x faster (variance: 1.008x, 1.007x)
helloworld       0.225s vs  0.225s --> 0.999x faster (variance: 1.009x, 1.009x)
html5ever-2016-  3.800s vs  3.637s --> 1.045x faster (variance: 1.006x, 1.006x)
hyper.0.5.0      4.642s vs  4.521s --> 1.027x faster (variance: 1.006x, 1.007x)
inflate-0.1.0    3.714s vs  3.671s --> 1.012x faster (variance: 1.007x, 1.007x)
issue-32062-equ  0.300s vs  0.292s --> 1.029x faster (variance: 1.011x, 1.026x)
issue-32278-big  1.535s vs  1.484s --> 1.034x faster (variance: 1.024x, 1.006x)
jld-day15-parse  1.343s vs  1.272s --> 1.056x faster (variance: 1.001x, 1.012x)
ostn15_phf      19.419s vs 18.372s --> 1.057x faster (variance: 1.003x, 1.027x)
piston-image-0. 10.855s vs 10.464s --> 1.037x faster (variance: 1.004x, 1.010x)
reddit-stress    2.217s vs  2.133s --> 1.039x faster (variance: 1.009x, 1.006x)
regex.0.1.30     2.244s vs  2.185s --> 1.027x faster (variance: 1.019x, 1.004x)
rust-encoding-0  1.862s vs  1.814s --> 1.027x faster (variance: 1.002x, 1.007x)
syntex-0.42.2   29.155s vs 28.059s --> 1.039x faster (variance: 1.019x, 1.003x)
syntex-0.42.2-i 13.689s vs 12.897s --> 1.061x faster (variance: 1.010x, 1.007x)

(reddit-stress and ostn15_phf are a couple of programs that aren't in rust-benchmarks that I've been measuring.)

r? @nikomatsakis

This speeds up compilation by 3--6% across most of rustc-benchmarks.
@nnethercote
Copy link
Contributor Author

Ugh, this PR is so conflict-prone.

@nikomatsakis
Copy link
Contributor

@bors r+

@bors
Copy link
Contributor

bors commented Nov 8, 2016

📌 Commit 00e48af has been approved by nikomatsakis

eddyb added a commit to eddyb/rust that referenced this pull request Nov 9, 2016
Replace FNV with a faster hash function.

Hash table lookups are very hot in rustc profiles and the time taken within `FnvHash` itself is a big part of that. Although FNV is a simple hash, it processes its input one byte at a time. In contrast, Firefox has a homespun hash function that is also simple but works on multiple bytes at a time. So I tried it out and the results are compelling:

```
futures-rs-test  4.326s vs  4.212s --> 1.027x faster (variance: 1.001x, 1.007x)
helloworld       0.233s vs  0.232s --> 1.004x faster (variance: 1.037x, 1.016x)
html5ever-2016-  5.397s vs  5.210s --> 1.036x faster (variance: 1.009x, 1.006x)
hyper.0.5.0      5.018s vs  4.905s --> 1.023x faster (variance: 1.007x, 1.006x)
inflate-0.1.0    4.889s vs  4.872s --> 1.004x faster (variance: 1.012x, 1.007x)
issue-32062-equ  0.347s vs  0.335s --> 1.035x faster (variance: 1.033x, 1.019x)
issue-32278-big  1.717s vs  1.622s --> 1.059x faster (variance: 1.027x, 1.028x)
jld-day15-parse  1.537s vs  1.459s --> 1.054x faster (variance: 1.005x, 1.003x)
piston-image-0. 11.863s vs 11.482s --> 1.033x faster (variance: 1.060x, 1.002x)
regex.0.1.30     2.517s vs  2.453s --> 1.026x faster (variance: 1.011x, 1.013x)
rust-encoding-0  2.080s vs  2.047s --> 1.016x faster (variance: 1.005x, 1.005x)
syntex-0.42.2   32.268s vs 31.275s --> 1.032x faster (variance: 1.014x, 1.022x)
syntex-0.42.2-i 17.629s vs 16.559s --> 1.065x faster (variance: 1.013x, 1.021x)
```

(That's a stage1 compiler doing debug builds. Results for a stage2 compiler are similar.)

The attached commit is not in a state suitable for landing because I changed the implementation of FnvHasher without changing its name (because that would have required touching many lines in the compiler). Nonetheless, it is a good place to start discussions.

Profiles show very clearly that this new hash function is a lot faster to compute than FNV. The quality of the new hash function is less clear -- it seems to do better in some cases and worse in others (judging by the number of instructions executed in `Hash{Map,Set}::get`).

CC @brson, @arthurprs
bors added a commit that referenced this pull request Nov 9, 2016
Rollup of 15 pull requests

- Successful merges: #36868, #37134, #37229, #37250, #37370, #37428, #37432, #37472, #37524, #37614, #37622, #37627, #37636, #37644, #37654
- Failed merges: #37463, #37542, #37645
@bors bors merged commit 00e48af into rust-lang:master Nov 9, 2016
@nnethercote nnethercote deleted the FxHasher branch November 10, 2016 00:58
@brson brson added the relnotes Marks issues that should be documented in the release notes of the next release. label Nov 15, 2016
@cbreeden
Copy link
Contributor

@nnethercote this hash is super fast on my dataset. Here are my tests for this hash one a personal round robin hashset implementation for about 4500 u32 (unicode):

test fnv        ... 
1 => 2586
2 => 1244
3 => 468
4 => 139
5 => 30
6 => 10
bench:      63,889 ns/iter (+/- 7,812)

test fxhasher   ... 
1 => 3305
2 => 1116
3 => 56
bench:      22,290 ns/iter (+/- 3,283)

test phf        ... bench:      72,287 ns/iter (+/- 6,156)
test static_fnv ... bench:      64,639 ns/iter (+/- 6,879)

The # => # shows how many probes were required in the round robin to find the correct element.
This should probably be a crate. Do you mind if I make one out of it? Or should you?

@nnethercote
Copy link
Contributor Author

@cbreeden I'm happy if you want to make a crate out of it. Make sure you observe the rustc license (of course) and you should probably make it clear in the docs that it's not a "well-designed" hash and so may not be suitable in all situations. Thanks.

@cbreeden
Copy link
Contributor

sounds good. Yeah, I got pretty lucky there, I'd say.

@cbreeden
Copy link
Contributor

I went ahead and decided to modify the write(..) method to hash in 4-byte chunks:

    fn write(&mut self, bytes: &[u8]) {
        let mut buf = bytes;
        while buf.len() >= 4 {
            let n = buf.read_u32::<NativeEndian>().unwrap();
            self.write_u32(n);
        }

        for byte in buf {
            let i = *byte;
            self.add_to_hash(i as usize);
        }
    }

Testing this with a few ascii byte slices yield these results:

 name           old ns/iter  chunks ns/iter  diff ns/iter   diff %  speedup
 bench_3chars   2            3                          1   50.00%   x 0.67
 bench_4chars   3            2                         -1  -33.33%   x 1.50
 bench_11chars  8            5                         -3  -37.50%   x 1.60
 bench_12chars  9            3                         -6  -66.67%   x 3.00
 bench_23chars  21           8                        -13  -61.90%   x 2.62
 bench_24chars  24           6                        -18  -75.00%   x 4.00

It appears that there is a clear win for hashing any byte slice with length > 3, which I believe is the common case. For some reason there is a regression when hashing in chunks of u64. (x64 Intel i7-6600U @ 2.6 GHz, Windows 10).

@nnethercote I know that you said .write() was called less than 1% of the time in your testing, but do you mind me asking what commands you used to for the rustc profile benchmarks? I would be curious if a patch like this would make any difference.

@cbreeden
Copy link
Contributor

@nnethercote nevermind, sorry for the spam. I think you were using https://github.com/rust-lang-nursery/rustc-benchmarks. I'll try it out when I get back home on a computer that can compile rustc in a reasonable amount of time.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this pull request Nov 13, 2020
Doc change: Remove mention of `fnv` in HashMap

Disclaimer: I am the author of [aHash](https://github.com/tkaitchuck/aHash).

This changes the Rustdoc in `HashMap` from mentioning the `fnv` crate to mentioning the `aHash` crate, as an alternative `Hasher` implementation.

### Why

Fnv [has poor hash quality](https://github.com/rurban/smhasher), is [slow for larger keys](https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#speed), and does not provide dos resistance, because it is unkeyed (this can also cause [other problems](https://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion)).

Fnv has acceptable performance for integers and has very poor performance with keys >32 bytes. This is the reason it was removed from the standard library in rust-lang#37229 .

Because regardless of which dimension you value, there are better alternatives, it does not make sense for anyone to consider using `fnv`.

The text mentioning `fnv` in the standard library continues to create confusion: rust-lang/hashbrown#153  rust-lang/hashbrown#9 . There are also a number of [crates using it](https://crates.io/crates/fnv/reverse_dependencies) a great many of which are hashing strings (Which is when Fnv is the [worst](https://github.com/cbreeden/fxhash#benchmarks), [possible](https://github.com/tkaitchuck/aHash#speed), [choice](http://cglab.ca/~abeinges/blah/hash-rs/).)

I think aHash makes the most sense to mention as an alternative because it is the most credible option (in my obviously biased opinion). It offers [good performance on numbers and strings](https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#speed), is [of high quality](https://github.com/tkaitchuck/aHash#hash-quality), and [provides dos resistance](https://github.com/tkaitchuck/aHash/wiki/How-aHash-is-resists-DOS-attacks). It is popular (see [stats](https://crates.io/crates/ahash)) and is the default hasher for [hashbrown](https://crates.io/crates/hashbrown) and [dashmap](https://crates.io/crates/dashmap) which are the most popular alternative hashmaps. Finally it does not have any of the [`gotcha` cases](https://github.com/tkaitchuck/aHash#fxhash) that `FxHash` suffers from. (Which is the other popular hashing option when DOS attacks are not a concern)

Signed-off-by: Tom Kaitchuck <tom.kaitchuck@emc.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
relnotes Marks issues that should be documented in the release notes of the next release. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.