Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Readme comparison std and FNV #153

Open
camsteffen opened this issue Apr 23, 2020 · 5 comments
Open

Improve Readme comparison std and FNV #153

camsteffen opened this issue Apr 23, 2020 · 5 comments

Comments

@camsteffen
Copy link

From the readme:

Since Rust 1.36, this is now the HashMap implementation for the Rust standard library. However you may still want to use this crate instead since it works in environments without std, such as embedded systems and kernels.

This paragraph implies that std is the same as hashbrown, but this omits that fact that std still uses SipHash and hashbrown uses AHash. Is this not a significant difference? (I'm asking since I don't fully understand these terms.)

Also, I can't find any guidance on choosing between hashbrown and FNV. Can we add this as well? Is FNV still generally the best choice when working with small keys? FNV is mentioned in the std docs so it seems warranted that this guidance should be available.

@Amanieu
Copy link
Member

Amanieu commented Apr 23, 2020

This paragraph implies that std is the same as hashbrown, but this omits that fact that std still uses SipHash and hashbrown uses AHash. Is this not a significant difference? (I'm asking since I don't fully understand these terms.)

Yes but the hash function can easily be changed using the S parameter on HashMap<K, V, S>.

Also, I can't find any guidance on choosing between hashbrown and FNV. Can we add this as well? Is FNV still generally the best choice when working with small keys? FNV is mentioned in the std docs so it seems warranted that this guidance should be available.

AHash or FxHash will be much faster for small keys than FNV or SipHash.

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue Nov 13, 2020
Doc change: Remove mention of `fnv` in HashMap

Disclaimer: I am the author of [aHash](https://github.com/tkaitchuck/aHash).

This changes the Rustdoc in `HashMap` from mentioning the `fnv` crate to mentioning the `aHash` crate, as an alternative `Hasher` implementation.

### Why

Fnv [has poor hash quality](https://github.com/rurban/smhasher), is [slow for larger keys](https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#speed), and does not provide dos resistance, because it is unkeyed (this can also cause [other problems](https://accidentallyquadratic.tumblr.com/post/153545455987/rust-hash-iteration-reinsertion)).

Fnv has acceptable performance for integers and has very poor performance with keys >32 bytes. This is the reason it was removed from the standard library in rust-lang#37229 .

Because regardless of which dimension you value, there are better alternatives, it does not make sense for anyone to consider using `fnv`.

The text mentioning `fnv` in the standard library continues to create confusion: rust-lang/hashbrown#153  rust-lang/hashbrown#9 . There are also a number of [crates using it](https://crates.io/crates/fnv/reverse_dependencies) a great many of which are hashing strings (Which is when Fnv is the [worst](https://github.com/cbreeden/fxhash#benchmarks), [possible](https://github.com/tkaitchuck/aHash#speed), [choice](http://cglab.ca/~abeinges/blah/hash-rs/).)

I think aHash makes the most sense to mention as an alternative because it is the most credible option (in my obviously biased opinion). It offers [good performance on numbers and strings](https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#speed), is [of high quality](https://github.com/tkaitchuck/aHash#hash-quality), and [provides dos resistance](https://github.com/tkaitchuck/aHash/wiki/How-aHash-is-resists-DOS-attacks). It is popular (see [stats](https://crates.io/crates/ahash)) and is the default hasher for [hashbrown](https://crates.io/crates/hashbrown) and [dashmap](https://crates.io/crates/dashmap) which are the most popular alternative hashmaps. Finally it does not have any of the [`gotcha` cases](https://github.com/tkaitchuck/aHash#fxhash) that `FxHash` suffers from. (Which is the other popular hashing option when DOS attacks are not a concern)

Signed-off-by: Tom Kaitchuck <tom.kaitchuck@emc.com>
@tkaitchuck
Copy link
Contributor

tkaitchuck commented Nov 25, 2020

@camsteffen You can see my benchmarks of just the hasher here: https://github.com/tkaitchuck/aHash/blob/master/compare/readme.md#speed
There is also some more qualitative comparison on that same page if you scroll up. FNV is only an alternative to aHash. The rest of HashBrown's code should perform the same regardless of which hasher is used.

@camsteffen
Copy link
Author

I think I found your speed comparison after I posted this. It's very helpful, thanks. Also thanks for updating the rust std docs. 👍

@tgross35
Copy link

Stumbled across this after some searching - I didn't realize that std::Collections::HashMap is literally a wrapper around HashBrown with a different default hasher. Knowing that helps clear some confusion up

https://github.com/rust-lang/rust/blob/2c3f284003d4bbedb9f0212c8f37c726124fa0ab/library/std/src/collections/hash/map.rs#L6
https://github.com/rust-lang/rust/blob/2c3f284003d4bbedb9f0212c8f37c726124fa0ab/library/std/Cargo.toml#L22

@Amanieu
Copy link
Member

Amanieu commented Dec 29, 2022

This should probably be clarified in the standard library docs for HashMap.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants