Releases: ashvardanian/StringZilla
v3.11.1: Matching N3322 for `memcpy` UB in C2y
v3.11.0: Checksums in AVX-512, AVX2, NEON
- 🆕
sz_checksum(char const *, size_t)
C 99 interface - 🆕
sz::str().checksum()
C++ 11 interface - 🆕
sz.checksum(str)
Python interface
Database and other Systems Engineers, you can now use StringZilla to dynamically dispatch different check-sum kernels for AVX2 capable Haswell+ CPUs, AVX-512BW capable Ice Lake+ CPUs, and Arm NEON CPUs on mobile. In AVX-512, masked loads are used extensively, resulting in a 10% improvement even on typical English words, averaging 5 bytes in length and 20x performance improvement compared to the serial code for longer strings.
On the technical side, on x86, the kernels use the well-known SAD(text, zeros)
idiom to accumulate absolute differences between individual bytes into 64-bit words. It also uses bidirectional traversal to saturate the core, capable of performing 2 loads per CPU cycle. Moreover, on large inputs, it switches to streaming loads, separately handling the head and the tail, similar to our memcpy
alternative, also outperforming LibC on AVX-512-capable machines 😎
Minor
Patch
Release v3.10.11
Release v3.10.10
Release: v3.10.10 [skip ci]
Release v3.10.9
Release: v3.10.9 [skip ci]
Release v3.10.8
Release: v3.10.8 [skip ci]
Release v3.10.7
Release: v3.10.7 [skip ci]
Release v3.10.6
Release: v3.10.6 [skip ci]
Release v3.10.5
Release: v3.10.5 [skip ci]
Release v3.10.4
Release: v3.10.4 [skip ci]