Skip to content

Commit

Permalink
Change construction to allow stable hashes even with AVX2 and use it …
Browse files Browse the repository at this point in the history
…in a hybrid way for maximum performance

* Change construction to allow stable hashes even with AVX2 and use it in a hybrid way for maximum performance

* Add unit test on permutations

* Fix permutations

* Recursive chunked stepped permute

* Reduce bytecode size from duplicated AES keys

* Fix is_stable test

* Update Benchmark Results

---------

Co-authored-by: Olivier Giniaux <oginiaux@smartadserver.com>
Co-authored-by: Benchmark Bot <benchmark-bot@noreply.com>
  • Loading branch information
3 people authored Dec 24, 2023
1 parent a987217 commit 93ebf7f
Show file tree
Hide file tree
Showing 20 changed files with 591 additions and 594 deletions.
26 changes: 19 additions & 7 deletions .github/workflows/bench.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,18 +15,30 @@ jobs:
- uses: actions/checkout@v4

- name: Benchmark
run: cargo bench --bench throughput --features 'bench-plot'
run: cargo bench --bench throughput --features bench-plot

- uses: actions/upload-artifact@v3
with:
name: benches
path: benches/throughput/x86_64.svg

benchmark-x86-avx2:
name: Benchmark X86 AVX2
runs-on: buildjet-2vcpu-ubuntu-2204

steps:
- uses: actions/checkout@v4

- name: Switch to nightly rust
run: rustup default nightly

- name: Benchmark AVX2 (nightly)
run: cargo bench --bench throughput --features 'bench-plot avx2'
- name: Benchmark
run: cargo bench --bench throughput --features bench-plot

- uses: actions/upload-artifact@v3
with:
name: benches
path: benches/throughput/*.svg
path: benches/throughput/x86_64-hybrid.svg

benchmark-arm:
name: Benchmark ARM
Expand All @@ -36,17 +48,17 @@ jobs:
- uses: actions/checkout@v4

- name: Benchmark
run: cargo bench --bench throughput --features 'bench-plot'
run: cargo bench --bench throughput --features bench-plot

- uses: actions/upload-artifact@v3
with:
name: benches
path: benches/throughput/*.svg
path: benches/throughput/aarch64.svg

commit:
name: Commit & Push
runs-on: buildjet-2vcpu-ubuntu-2204
needs: [benchmark-x86, benchmark-arm]
needs: [benchmark-x86, benchmark-x86-avx2, benchmark-arm]

permissions:
contents: write
Expand Down
34 changes: 32 additions & 2 deletions .github/workflows/build_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,44 @@ env:
CARGO_TERM_COLOR: always

jobs:
build_test:
build_test_x86:
name: Build & Test X86
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

name: Build & Test
- name: Build
run: cargo build --release

- name: Test
run: cargo test --release

build_test_x86_avx2:
name: Build & Test X86 AVX2
runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v3

- name: Switch to nightly rust
run: rustup default nightly

- name: Build
run: cargo build --release

- name: Test
run: cargo test --release

build_test_arm:
name: Build & Test ARM
runs-on: buildjet-2vcpu-ubuntu-2204-arm

steps:
- uses: actions/checkout@v3

- name: Build
run: cargo build --release

- name: Test
run: cargo test --release
9 changes: 4 additions & 5 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
[package]
name = "gxhash"
authors = ["Olivier Giniaux"]
version = "2.3.1"
version = "3.0.0"
edition = "2021"
description = "GxHash non-cryptographic algorithm"
license = "MIT"
Expand All @@ -13,10 +13,6 @@ categories = ["algorithms", "data-structures", "no-std"]
exclude = ["article/*"]

[features]
# The 256-bit state GxHash is faster for large inputs than the default 128-bit state implementation, but faster on smaller hashes.
# Please not however that the 256-bit GxHash and the 128-bit GxHash don't generate the same hashes for a same input.
# Requires AVX2 and VAES (X86).
avx2 = []
# Only relevant for throughput benchmarks
bench-csv = []
bench-md = []
Expand All @@ -39,6 +35,9 @@ seahash = "4.1.0"
metrohash = "1.0.6"
fnv = "1.0.3"

[build-dependencies]
rustc_version = "0.4.0"

[dev-dependencies.plotters]
version = "0.3.5"
default-features = false
Expand Down
6 changes: 3 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ GxHash is compatible with:
> Other platforms are currently not supported (there is no fallback). The behavior on these platforms is undefined.
### Hashes Stability
All generated hashes for a given version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms. An exception to this is the AVX2 version of GxHash (nightly).
All generated hashes for a given version of GxHash are stable, meaning that for a given input the output hash will be the same across all supported platforms.

## Benchmarks

Expand Down Expand Up @@ -74,7 +74,7 @@ GxHash is continuously benchmarked on X86 and ARM Github runners.
GxHash is a seeded hashing algorithm, meaning that depending on the seed used, it will generate completely different hashes. The default `HasherBuilder` (`GxHasherBuilder::default()`) uses seed randomization, making any `HashMap`/`HashSet` more DOS resistant, as it will make it much more difficult for attackers to be able to predict which hashes may collide without knowing the seed used. This does not mean however that it is completely DOS resistant. This has to be analyzed further.

### Multicollisions Resistance
GxHash uses a 128-bit internal state (and even 256-bit with the `avx2` feature). This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.
GxHash uses a 128-bit internal state. This makes GxHash [a widepipe construction](https://en.wikipedia.org/wiki/Merkle%E2%80%93Damg%C3%A5rd_construction#Wide_pipe_construction) when generating hashes of size 64-bit or smaller, which had amongst other properties to be inherently more resistant to multicollision attacks. See [this paper](https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf) for more details.

### Cryptographic Properties
GxHash is a non-cryptographic hashing algorithm, thus it is not recommended to use it as a cryptographic algorithm (it is not a replacement for SHA). It has not been assessed if GxHash is preimage resistant and how difficult it is to be reversed.
Expand Down Expand Up @@ -103,4 +103,4 @@ Publication:
[PDF](https://github.com/ogxd/gxhash-rust/blob/main/article/article.pdf)

Cite this publication / algorithm:
[![DOI](https://zenodo.org/badge/690754256.svg)](https://zenodo.org/badge/latestdoi/690754256)
[![DOI](https://zenodo.org/badge/690754256.svg)](https://zenodo.org/badge/latestdoi/690754256)
73 changes: 61 additions & 12 deletions benches/quality/main.rs
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
use std::{hash::{Hash, Hasher, BuildHasher}, collections::HashSet};
use std::{hash::{Hash, Hasher, BuildHasher}, collections::HashSet, slice};
use rand::Rng;
use criterion::black_box;

Expand Down Expand Up @@ -59,25 +59,33 @@ fn bench_hasher_quality<B>(name: &str)
check!(collisions_flipped_bits::<B, 64>(3));
check!(collisions_flipped_bits::<B, 256>(2));

check!(collisions_permute::<B, u8>(4, &Vec::from_iter(0..16))); // 16 bytes
check!(collisions_permute::<B, u8>(42, &Vec::from_iter(0..64)));
check!(collisions_permute::<B, u16>(42, &Vec::from_iter(0..64)));
check!(collisions_permute::<B, u32>(42, &Vec::from_iter(0..64)));
check!(collisions_permute::<B, u64>(42, &Vec::from_iter(0..64)));
check!(collisions_permute::<B, u128>(4, &Vec::from_iter(0..16))); // 256 bytes
check!(collisions_permute::<B, u128>(42, &Vec::from_iter(0..64))); // 1024 bytes

check!(collisions_powerset_bytes::<B>(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]));
check!(collisions_powerset_bytes::<B>(&[0, 1, 2, 4, 8, 16, 32, 64, 128]));

check!(collisions_permuted_hasher_values::<B, u8>(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]));
check!(collisions_permuted_hasher_values::<B, u32>(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]));
check!(collisions_permuted_hasher_values::<B, u32>(&[0, 1, 2, 4, 8, 16, 32, 64, 128, 256]));
check!(hasher_collisions_permute::<B, u8>(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]));
check!(hasher_collisions_permute::<B, u32>(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]));
check!(hasher_collisions_permute::<B, u32>(&[0, 1, 2, 4, 8, 16, 32, 64, 128, 256]));

check!(collisions_powerset_hasher_values::<B, u32>(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]));
check!(collisions_powerset_hasher_values::<B, u32>(&[0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384]));
check!(hasher_collisions_powerset::<B, u32>(&[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]));
check!(hasher_collisions_powerset::<B, u32>(&[0, 1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048, 4096, 8192, 16384]));
}

fn collisions_permuted_hasher_values<B, D>(data: &[impl Hash]) -> f64
fn hasher_collisions_permute<B, D>(data: &[impl Hash]) -> f64
where B : BuildHasher + Default
{
use itertools::Itertools;

let build_hasher = B::default();

let mut set = ahash::AHashSet::new();
let mut set = HashSet::new();
let mut i = 0;

for perm in data.iter().permutations(data.len()) {
Expand All @@ -93,14 +101,55 @@ fn collisions_permuted_hasher_values<B, D>(data: &[impl Hash]) -> f64
(i - set.len()) as f64 / i as f64
}

fn collisions_powerset_hasher_values<B, D>(data: &[impl Hash]) -> f64
fn collisions_permute<B, D>(step: usize, data: &[D]) -> f64
where B : BuildHasher + Default,
D : Clone
{
let build_hasher = B::default();

let mut set = HashSet::new();
let mut i = 0;

let mut x = data.to_vec();
permute(&mut x, 0, step, &mut |d| {
let len = data.len() * std::mem::size_of::<D>();
let perm_u8 = unsafe {
slice::from_raw_parts(d.as_ptr() as *const u8, len)
};
let mut hasher = build_hasher.build_hasher();
hasher.write(&perm_u8);
set.insert(hasher.finish());
i += 1;
});

//println!("Permutations. Combinations: {}, Collisions: {}", i, i - set.len());

// Collision rate
(i - set.len()) as f64 / i as f64
}

fn permute<T, F>(arr: &mut [T], start: usize, step: usize, f: &mut F)
where F: FnMut(&[T])
{
if start >= arr.len() - 1 {
f(arr);
} else {
for i in (start..arr.len()).step_by(step) {
arr.swap(start, i);
permute(arr, start + 1, step, f);
arr.swap(start, i);
}
}
}

fn hasher_collisions_powerset<B, D>(data: &[impl Hash]) -> f64
where B : BuildHasher + Default
{
use itertools::Itertools;

let build_hasher = B::default();

let mut set = ahash::AHashSet::new();
let mut set = HashSet::new();
let mut i = 0;

for perm in data.iter().powerset() {
Expand All @@ -123,7 +172,7 @@ fn collisions_powerset_bytes<B>(data: &[u8]) -> f64

let build_hasher = B::default();

let mut set = ahash::AHashSet::new();
let mut set = HashSet::new();
let mut i = 0;

for perm in data.iter().powerset() {
Expand All @@ -146,7 +195,7 @@ fn collisions_padded_zeroes<B>(max_size: usize) -> f64
let build_hasher = B::default();
let bytes = vec![0u8; max_size];

let mut set = ahash::AHashSet::new();
let mut set = HashSet::new();

let mut i = 0;

Expand Down
Loading

0 comments on commit 93ebf7f

Please sign in to comment.