Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid state gxhash #34

Closed
3 tasks done
ogxd opened this issue Nov 29, 2023 · 12 comments · Fixed by #39
Closed
3 tasks done

Hybrid state gxhash #34

ogxd opened this issue Nov 29, 2023 · 12 comments · Fixed by #39

Comments

@ogxd
Copy link
Owner

ogxd commented Nov 29, 2023

Context

Currently, gxhash with a 128-bit state is much faster for small input sizes, while gxhash with a 256-bit state is faster for large input sizes. The idea is to study the possibilities of making an hybrid state version of gxhash, leveraring the advantages of 128-bit state with the advantages of the 256-bit state processing, for maximum throughput for all input sizes.

See the break-even point:
image

Challenges

There are two ways to achieve this:

  • Either modify the 256-bit state version to use 128-bit state logic for small input sizes
  • Or try to make the 256-bit state processing stable along the 128-bit state. This is more complex, but if achieved, it could greatly simplify the usage of gxhash as a whole (stable for intrinsics set width)

Some challenges are:

  • It must be all evaluated at compile time because we don't want to increase the bytecode of the algorithm. There could not be a dynamic switch between two state width paths in the algorithm.
  • Making 256-bit and 128-bit stable requires rethinking of the high ILP construction

Todo

  • Study possibilities of using generics to avoid dynamic state width evaluation
  • Figure out where the break-even point is
  • Study the possibility of having a stable high ILP construction for both 128 and 256 bit states
@ogxd
Copy link
Owner Author

ogxd commented Nov 29, 2023

Results for first try at an hybrid approach
x86_64

This is very encouraging. This is all done without using generics, which ended up making the code too complex for no good reason IMHO.
Note that this all remains stable with the full 128 bit version, which is AWESOME.

There are a few remaing things to tackle:

  • _mm256 AES require stdsimd unstable feature. How to deal with this? Shall we add an 'hybrid' feature to preserve compatibily with stable rust by default?
  • Can we relax tmp construction compression to use more compress_fast ?
  • This need to be re-ran in SMHasher to make sure quality properties are still as good, because the construction had to be slightly modified

@ogxd
Copy link
Owner Author

ogxd commented Nov 30, 2023

Some bench results on my Ryzen 5 PC:

no avx2 with avx2
main branch image image
hybrid-2 branch image image

In 4th picture we use both 128-bit SIMD for small inputs and 256-bit SIMD for larger inputs. It has the benefit of being stable with the 128-bit only version |

@xxaier
Copy link

xxaier commented Nov 30, 2023

s128 s256 hybird and the other hash maybe can draw it in the same picture so it looks clearer

@ogxd
Copy link
Owner Author

ogxd commented Nov 30, 2023

I can't because they're all generated from different compilation targets but I have used a max y of 130000 for all so scalewise it is comparable.
As you can see with the hybrid+stable approach there is a small loss in throughput starting with 32-bytes long inputs compared to the full AVX2 algorithm. Maybe a small compromise is acceptable if stability is guaranteed but still I'd like to continue fiddling with the implementation to see if a little more throughput can be squeezed out of it

@3tieto
Copy link

3tieto commented Dec 1, 2023

Share an article for reference only

https://deepmind.google/discover/blog/alphadev-discovers-faster-sorting-algorithms/

We applied AlphaDev to one of the most commonly used algorithms for hashing in data structures to try and discover a faster algorithm. And when we applied it to the 9-16 bytes range of the hashing function, the algorithm that AlphaDev discovered was 30% faster.

This year, AlphaDev’s new hashing algorithm was released into the open-source Abseil library, available to millions of developers around the world, and we estimate that it’s now being used trillions of times a day.

@ogxd
Copy link
Owner Author

ogxd commented Dec 1, 2023

Interesting article but I doubt AI (at least AI as we know it currently) would help making gxhash more performant

@ogxd
Copy link
Owner Author

ogxd commented Dec 4, 2023

Here is what it looks like after a few more optimizations (still passing SMHasher)

image

gxhash-2 is the current version on main, gxhash-3 is what is proposed in this PR

Takeaways:

  • When AVX2 is not available, gxhash-3 is faster than gxhash-2 for all inputs sizes.
  • When AVX2 is available
    • gxhash-3 is faster than gxhash-2 for all puts sizes, with even greater throughput on nightly builds, without compromise on stability of requiring extra flags like 'avx2' in gxhash-2
    • If the flag 'avx2' is enabled for gxhash-2, gxhash-3 is faster for small inputs sizes, but is outperformd for the largest inputs and a small range for mid size inputs. Here is the tradeoff.

@ogxd ogxd linked a pull request Dec 4, 2023 that will close this issue
@3tieto
Copy link

3tieto commented Dec 5, 2023

Are all these hashes stable ?

@ogxd
Copy link
Owner Author

ogxd commented Dec 5, 2023

Yes all hashes generated in v3 will be the same, independently from actual intrinsics used.
I did one more batch of optimizations yesterday but I am reaching the limits (or at least my limit 😅).
I am going to proceed to a cleanup and review this as a whole

@ogxd
Copy link
Owner Author

ogxd commented Dec 6, 2023

I did (again 😅) one more batch of optimizations

Before

image

After

image

Merging this by the end of the week.

@3tieto
Copy link

3tieto commented Dec 6, 2023

Currently, the rust standard library uses hashbrown, and hashbrown uses ahash. I think you can try submitting a pull request to hashbrown first and replace ahash with gxhash.

https://github.com/rust-lang/hashbrown

Since Rust 1.36, this is now the HashMap implementation for the Rust standard library. However you may still want to use this crate instead since it works in environments without std, such as embedded systems and kernels.

Uses AHash as the default hasher, which is much faster than SipHash. However, AHash does not provide the same level of HashDoS resistance as SipHash, so if that is important to you, you might want to consider using a different hasher.

image

@ogxd
Copy link
Owner Author

ogxd commented Dec 6, 2023

Yes I know I've already done that, and it was earlier today to be precise ;)

This has led to this first issue #40

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment